Overview

Dataset statistics

Number of variables69
Number of observations42497
Missing cells838314
Missing cells (%)28.6%
Duplicate rows1364
Duplicate rows (%)3.2%
Total size in memory19.3 MiB
Average record size in memory475.0 B

Variable types

Categorical45
Numeric9
Boolean11
Unsupported4

Alerts

Dataset has 1364 (3.2%) duplicate rowsDuplicates
Ref_publication_date has a high cardinality: 1890 distinct valuesHigh cardinality
Substrate_stack_sequence has a high cardinality: 194 distinct valuesHigh cardinality
ETL_stack_sequence has a high cardinality: 1444 distinct valuesHigh cardinality
ETL_thickness has a high cardinality: 1419 distinct valuesHigh cardinality
ETL_additives_compounds has a high cardinality: 447 distinct valuesHigh cardinality
ETL_deposition_procedure has a high cardinality: 368 distinct valuesHigh cardinality
Perovskite_composition_a_ions has a high cardinality: 264 distinct valuesHigh cardinality
Perovskite_composition_a_ions_coefficients has a high cardinality: 565 distinct valuesHigh cardinality
Perovskite_composition_b_ions has a high cardinality: 53 distinct valuesHigh cardinality
Perovskite_composition_b_ions_coefficients has a high cardinality: 148 distinct valuesHigh cardinality
Perovskite_composition_c_ions_coefficients has a high cardinality: 370 distinct valuesHigh cardinality
Perovskite_additives_compounds has a high cardinality: 979 distinct valuesHigh cardinality
Perovskite_additives_concentrations has a high cardinality: 706 distinct valuesHigh cardinality
Perovskite_deposition_procedure has a high cardinality: 198 distinct valuesHigh cardinality
Perovskite_deposition_synthesis_atmosphere has a high cardinality: 186 distinct valuesHigh cardinality
Perovskite_deposition_solvents has a high cardinality: 343 distinct valuesHigh cardinality
Perovskite_deposition_solvents_mixing_ratios has a high cardinality: 528 distinct valuesHigh cardinality
Perovskite_deposition_quenching_media has a high cardinality: 94 distinct valuesHigh cardinality
Perovskite_deposition_quenching_media_additives_compounds has a high cardinality: 84 distinct valuesHigh cardinality
Perovskite_deposition_thermal_annealing_temperature has a high cardinality: 870 distinct valuesHigh cardinality
Perovskite_deposition_thermal_annealing_time has a high cardinality: 766 distinct valuesHigh cardinality
Perovskite_deposition_after_treatment_of_formed_perovskite has a high cardinality: 103 distinct valuesHigh cardinality
HTL_stack_sequence has a high cardinality: 1959 distinct valuesHigh cardinality
HTL_thickness_list has a high cardinality: 439 distinct valuesHigh cardinality
HTL_additives_compounds has a high cardinality: 380 distinct valuesHigh cardinality
HTL_additives_concentrations has a high cardinality: 223 distinct valuesHigh cardinality
HTL_deposition_procedure has a high cardinality: 120 distinct valuesHigh cardinality
HTL_deposition_solvents has a high cardinality: 51 distinct valuesHigh cardinality
Backcontact_stack_sequence has a high cardinality: 289 distinct valuesHigh cardinality
Backcontact_thickness_list has a high cardinality: 384 distinct valuesHigh cardinality
Backcontact_deposition_procedure has a high cardinality: 113 distinct valuesHigh cardinality
Encapsulation_stack_sequence has a high cardinality: 118 distinct valuesHigh cardinality
Cell_area_measured is highly overall correlated with Module_area_total and 8 other fieldsHigh correlation
Module_area_total is highly overall correlated with Cell_area_measured and 7 other fieldsHigh correlation
Perovskite_deposition_number_of_deposition_steps is highly overall correlated with JV_scan_integration_timeHigh correlation
JV_light_intensity is highly overall correlated with ETL_surface_treatment_before_next_deposition_step and 4 other fieldsHigh correlation
JV_scan_speed is highly overall correlated with ETL_surface_treatment_before_next_deposition_step and 6 other fieldsHigh correlation
JV_scan_delay_time is highly overall correlated with Cell_area_measured and 13 other fieldsHigh correlation
JV_preconditioning_time is highly overall correlated with Module_area_total and 18 other fieldsHigh correlation
JV_preconditioning_potential is highly overall correlated with Cell_flexible and 11 other fieldsHigh correlation
JV_default_PCE is highly overall correlated with Perovskite_surface_treatment_before_next_deposition_stepHigh correlation
Cell_architecture is highly overall correlated with JV_scan_integration_timeHigh correlation
Cell_flexible is highly overall correlated with JV_preconditioning_time and 4 other fieldsHigh correlation
Module is highly overall correlated with JV_scan_delay_time and 3 other fieldsHigh correlation
ETL_surface_treatment_before_next_deposition_step is highly overall correlated with Cell_area_measured and 19 other fieldsHigh correlation
Perovskite_dimension_2D is highly overall correlated with Module_area_total and 9 other fieldsHigh correlation
Perovskite_dimension_3D is highly overall correlated with JV_scan_delay_time and 5 other fieldsHigh correlation
Perovskite_dimension_3D_with_2D_capping_layer is highly overall correlated with JV_scan_delay_time and 8 other fieldsHigh correlation
Perovskite_composition_perovskite_ABC3_structure is highly overall correlated with JV_scan_delay_time and 7 other fieldsHigh correlation
Perovskite_composition_b_ions is highly overall correlated with JV_preconditioning_time and 6 other fieldsHigh correlation
Perovskite_composition_c_ions is highly overall correlated with Perovskite_dimension_3D_with_2D_capping_layer and 1 other fieldsHigh correlation
Perovskite_composition_none_stoichiometry_components_in_excess is highly overall correlated with Cell_area_measured and 5 other fieldsHigh correlation
Perovskite_band_gap_graded is highly overall correlated with Module_area_total and 8 other fieldsHigh correlation
Perovskite_deposition_quenching_induced_crystallisation is highly overall correlated with JV_preconditioning_time and 4 other fieldsHigh correlation
Perovskite_deposition_quenching_media is highly overall correlated with JV_preconditioning_time and 4 other fieldsHigh correlation
Perovskite_deposition_quenching_media_additives_compounds is highly overall correlated with Cell_area_measured and 17 other fieldsHigh correlation
Perovskite_deposition_thermal_annealing_atmosphere is highly overall correlated with ETL_surface_treatment_before_next_deposition_step and 1 other fieldsHigh correlation
Perovskite_deposition_solvent_annealing is highly overall correlated with JV_preconditioning_time and 3 other fieldsHigh correlation
Perovskite_deposition_solvent_annealing_solvent_atmosphere is highly overall correlated with JV_scan_delay_time and 5 other fieldsHigh correlation
Perovskite_surface_treatment_before_next_deposition_step is highly overall correlated with Cell_area_measured and 18 other fieldsHigh correlation
HTL_deposition_synthesis_atmosphere is highly overall correlated with ETL_surface_treatment_before_next_deposition_step and 2 other fieldsHigh correlation
HTL_deposition_solvents is highly overall correlated with Perovskite_surface_treatment_before_next_deposition_step and 4 other fieldsHigh correlation
HTL_deposition_solvents_mixing_ratios is highly overall correlated with Cell_area_measured and 13 other fieldsHigh correlation
Add_lay_front_stack_sequence is highly overall correlated with JV_scan_delay_time and 7 other fieldsHigh correlation
Add_lay_back is highly overall correlated with JV_scan_delay_time and 9 other fieldsHigh correlation
Encapsulation is highly overall correlated with JV_preconditioning_time and 4 other fieldsHigh correlation
JV_light_spectra is highly overall correlated with JV_preconditioning_time and 7 other fieldsHigh correlation
JV_scan_integration_time is highly overall correlated with Cell_area_measured and 25 other fieldsHigh correlation
JV_preconditioning_protocol is highly overall correlated with Cell_area_measured and 12 other fieldsHigh correlation
Cell_architecture is highly imbalanced (68.0%)Imbalance
Cell_flexible is highly imbalanced (83.0%)Imbalance
Module is highly imbalanced (93.2%)Imbalance
Substrate_stack_sequence is highly imbalanced (83.1%)Imbalance
ETL_stack_sequence is highly imbalanced (51.9%)Imbalance
ETL_additives_compounds is highly imbalanced (84.1%)Imbalance
ETL_deposition_procedure is highly imbalanced (56.2%)Imbalance
Perovskite_dimension_2D is highly imbalanced (83.9%)Imbalance
Perovskite_dimension_3D is highly imbalanced (79.7%)Imbalance
Perovskite_dimension_3D_with_2D_capping_layer is highly imbalanced (95.9%)Imbalance
Perovskite_composition_perovskite_ABC3_structure is highly imbalanced (86.7%)Imbalance
Perovskite_composition_a_ions is highly imbalanced (75.3%)Imbalance
Perovskite_composition_a_ions_coefficients is highly imbalanced (74.1%)Imbalance
Perovskite_composition_b_ions is highly imbalanced (92.3%)Imbalance
Perovskite_composition_b_ions_coefficients is highly imbalanced (91.3%)Imbalance
Perovskite_composition_c_ions is highly imbalanced (80.0%)Imbalance
Perovskite_composition_c_ions_coefficients is highly imbalanced (75.1%)Imbalance
Perovskite_composition_none_stoichiometry_components_in_excess is highly imbalanced (54.4%)Imbalance
Perovskite_band_gap_graded is highly imbalanced (98.5%)Imbalance
Perovskite_deposition_procedure is highly imbalanced (73.7%)Imbalance
Perovskite_deposition_synthesis_atmosphere is highly imbalanced (63.3%)Imbalance
Perovskite_deposition_solvents is highly imbalanced (58.6%)Imbalance
Perovskite_deposition_solvents_mixing_ratios is highly imbalanced (58.6%)Imbalance
Perovskite_deposition_quenching_media is highly imbalanced (67.1%)Imbalance
Perovskite_deposition_thermal_annealing_atmosphere is highly imbalanced (94.2%)Imbalance
Perovskite_deposition_solvent_annealing is highly imbalanced (85.3%)Imbalance
Perovskite_deposition_solvent_annealing_solvent_atmosphere is highly imbalanced (96.7%)Imbalance
Perovskite_deposition_after_treatment_of_formed_perovskite is highly imbalanced (56.2%)Imbalance
HTL_stack_sequence is highly imbalanced (64.7%)Imbalance
HTL_additives_compounds is highly imbalanced (73.8%)Imbalance
HTL_deposition_procedure is highly imbalanced (82.9%)Imbalance
HTL_deposition_synthesis_atmosphere is highly imbalanced (94.6%)Imbalance
HTL_deposition_solvents is highly imbalanced (94.0%)Imbalance
HTL_deposition_solvents_mixing_ratios is highly imbalanced (66.1%)Imbalance
Backcontact_stack_sequence is highly imbalanced (71.6%)Imbalance
Backcontact_thickness_list is highly imbalanced (60.2%)Imbalance
Backcontact_deposition_procedure is highly imbalanced (84.7%)Imbalance
Add_lay_front_stack_sequence is highly imbalanced (99.1%)Imbalance
Add_lay_back is highly imbalanced (99.2%)Imbalance
Encapsulation is highly imbalanced (75.3%)Imbalance
Encapsulation_stack_sequence is highly imbalanced (95.2%)Imbalance
JV_light_spectra is highly imbalanced (98.6%)Imbalance
JV_preconditioning_protocol is highly imbalanced (55.6%)Imbalance
Cell_area_measured has 711 (1.7%) missing valuesMissing
Module_area_total has 42165 (99.2%) missing valuesMissing
ETL_thickness has 23933 (56.3%) missing valuesMissing
ETL_additives_compounds has 3293 (7.7%) missing valuesMissing
ETL_surface_treatment_before_next_deposition_step has 42330 (99.6%) missing valuesMissing
Perovskite_composition_none_stoichiometry_components_in_excess has 35521 (83.6%) missing valuesMissing
Perovskite_additives_compounds has 28675 (67.5%) missing valuesMissing
Perovskite_additives_concentrations has 37606 (88.5%) missing valuesMissing
Perovskite_thickness has 29124 (68.5%) missing valuesMissing
Perovskite_band_gap has 10585 (24.9%) missing valuesMissing
Perovskite_pl_max has 32254 (75.9%) missing valuesMissing
Perovskite_deposition_solvents_mixing_ratios has 2125 (5.0%) missing valuesMissing
Perovskite_deposition_quenching_media_mixing_ratios has 41781 (98.3%) missing valuesMissing
Perovskite_deposition_quenching_media_additives_compounds has 41618 (97.9%) missing valuesMissing
Perovskite_deposition_after_treatment_of_formed_perovskite has 40873 (96.2%) missing valuesMissing
Perovskite_surface_treatment_before_next_deposition_step has 42493 (> 99.9%) missing valuesMissing
HTL_thickness_list has 32425 (76.3%) missing valuesMissing
HTL_additives_compounds has 18371 (43.2%) missing valuesMissing
HTL_additives_concentrations has 41462 (97.6%) missing valuesMissing
HTL_deposition_solvents_mixing_ratios has 41988 (98.8%) missing valuesMissing
Backcontact_thickness_list has 7976 (18.8%) missing valuesMissing
JV_light_spectra has 2491 (5.9%) missing valuesMissing
JV_scan_speed has 26362 (62.0%) missing valuesMissing
JV_scan_delay_time has 42370 (99.7%) missing valuesMissing
JV_scan_integration_time has 42477 (> 99.9%) missing valuesMissing
JV_preconditioning_protocol has 41497 (97.6%) missing valuesMissing
JV_preconditioning_time has 42279 (99.5%) missing valuesMissing
JV_preconditioning_potential has 42362 (99.7%) missing valuesMissing
JV_default_PCE has 925 (2.2%) missing valuesMissing
Cell_area_measured is highly skewed (γ1 = 193.1041949)Skewed
JV_light_intensity is highly skewed (γ1 = 71.51583535)Skewed
JV_scan_speed is highly skewed (γ1 = 122.3392709)Skewed
Perovskite_surface_treatment_before_next_deposition_step is uniformly distributedUniform
Perovskite_thickness is an unsupported type, check if it needs cleaning or further analysisUnsupported
Perovskite_band_gap is an unsupported type, check if it needs cleaning or further analysisUnsupported
Perovskite_pl_max is an unsupported type, check if it needs cleaning or further analysisUnsupported
Perovskite_deposition_quenching_media_mixing_ratios is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-05-05 12:07:26.182989
Analysis finished2023-05-05 12:07:57.994062
Duration31.81 seconds
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

Distinct1890
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
2017-05-15
 
161
2016-04-28
 
124
2018-12-13
 
113
2018-10-11
 
113
2018-01-04
 
106
Other values (1885)
41880 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters424970
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique52 ?
Unique (%)0.1%

Sample

1st row2015-01-06
2nd row2015-01-06
3rd row2015-01-06
4th row2015-01-06
5th row2015-01-06

Common Values

ValueCountFrequency (%)
2017-05-15 161
 
0.4%
2016-04-28 124
 
0.3%
2018-12-13 113
 
0.3%
2018-10-11 113
 
0.3%
2018-01-04 106
 
0.2%
2017-08-23 105
 
0.2%
2019-04-15 105
 
0.2%
2019-03-26 104
 
0.2%
2019-01-11 104
 
0.2%
2018-04-24 101
 
0.2%
Other values (1880) 41361
97.3%

Length

2023-05-05T12:07:58.105044image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2017-05-15 161
 
0.4%
2016-04-28 124
 
0.3%
2018-12-13 113
 
0.3%
2018-10-11 113
 
0.3%
2018-01-04 106
 
0.2%
2017-08-23 105
 
0.2%
2019-04-15 105
 
0.2%
2019-03-26 104
 
0.2%
2019-01-11 104
 
0.2%
2018-04-24 101
 
0.2%
Other values (1880) 41361
97.3%

Most occurring characters

ValueCountFrequency (%)
0 95874
22.6%
- 84994
20.0%
1 78206
18.4%
2 68402
16.1%
8 18889
 
4.4%
9 18142
 
4.3%
7 15875
 
3.7%
6 14135
 
3.3%
5 11366
 
2.7%
3 9740
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 339976
80.0%
Dash Punctuation 84994
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 95874
28.2%
1 78206
23.0%
2 68402
20.1%
8 18889
 
5.6%
9 18142
 
5.3%
7 15875
 
4.7%
6 14135
 
4.2%
5 11366
 
3.3%
3 9740
 
2.9%
4 9347
 
2.7%
Dash Punctuation
ValueCountFrequency (%)
- 84994
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 424970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 95874
22.6%
- 84994
20.0%
1 78206
18.4%
2 68402
16.1%
8 18889
 
4.4%
9 18142
 
4.3%
7 15875
 
3.7%
6 14135
 
3.3%
5 11366
 
2.7%
3 9740
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 424970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 95874
22.6%
- 84994
20.0%
1 78206
18.4%
2 68402
16.1%
8 18889
 
4.4%
9 18142
 
4.3%
7 15875
 
3.7%
6 14135
 
3.3%
5 11366
 
2.7%
3 9740
 
2.3%

Cell_area_measured
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct504
Distinct (%)1.2%
Missing711
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean0.20661691
Minimum0.00012
Maximum1063
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:07:58.257514image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.00012
5-th percentile0.04
Q10.09
median0.1
Q30.12
95-th percentile0.3
Maximum1063
Range1062.9999
Interquartile range (IQR)0.03

Descriptive statistics

Standard deviation5.3022697
Coefficient of variation (CV)25.662322
Kurtosis38637.88
Mean0.20661691
Median Absolute Deviation (MAD)0.01
Skewness193.10419
Sum8633.6942
Variance28.114064
MonotonicityNot monotonic
2023-05-05T12:07:58.424041image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.1 12395
29.2%
0.09 6522
15.3%
0.16 3341
 
7.9%
0.04 1995
 
4.7%
0.06 1492
 
3.5%
0.12 1380
 
3.2%
0.07 995
 
2.3%
0.08 828
 
1.9%
1 566
 
1.3%
0.2 488
 
1.1%
Other values (494) 11784
27.7%
(Missing) 711
 
1.7%
ValueCountFrequency (%)
0.00012 1
 
< 0.1%
0.0002 4
< 0.1%
0.000364 1
 
< 0.1%
0.0004 8
< 0.1%
0.0005 4
< 0.1%
0.0006 4
< 0.1%
0.0007 4
< 0.1%
0.000725 3
 
< 0.1%
0.001 3
 
< 0.1%
0.002 5
< 0.1%
ValueCountFrequency (%)
1063 1
 
< 0.1%
100 1
 
< 0.1%
70 2
< 0.1%
40 3
< 0.1%
36.1 4
< 0.1%
36 2
< 0.1%
35.8 1
 
< 0.1%
25 1
 
< 0.1%
22.4 1
 
< 0.1%
20.78 1
 
< 0.1%

Cell_architecture
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
nip
29667 
pin
12776 
Back contacted
 
33
Unknown
 
8
Front contacted
 
7
Other values (2)
 
6

Length

Max length17
Median length3
Mean length3.0121891
Min length3

Characters and Unicode

Total characters128009
Distinct characters24
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rownip
2nd rownip
3rd rownip
4th rownip
5th rownip

Common Values

ValueCountFrequency (%)
nip 29667
69.8%
pin 12776
30.1%
Back contacted 33
 
0.1%
Unknown 8
 
< 0.1%
Front contacted 7
 
< 0.1%
Schottky 5
 
< 0.1%
Pn-Heterojunction 1
 
< 0.1%

Length

2023-05-05T12:07:58.604922image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-05T12:07:58.805177image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
nip 29667
69.7%
pin 12776
30.0%
contacted 40
 
0.1%
back 33
 
0.1%
unknown 8
 
< 0.1%
front 7
 
< 0.1%
schottky 5
 
< 0.1%
pn-heterojunction 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
n 42517
33.2%
i 42444
33.2%
p 42443
33.2%
c 119
 
0.1%
t 99
 
0.1%
a 73
 
0.1%
o 62
 
< 0.1%
k 46
 
< 0.1%
e 42
 
< 0.1%
40
 
< 0.1%
Other values (14) 124
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 127913
99.9%
Uppercase Letter 55
 
< 0.1%
Space Separator 40
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 42517
33.2%
i 42444
33.2%
p 42443
33.2%
c 119
 
0.1%
t 99
 
0.1%
a 73
 
0.1%
o 62
 
< 0.1%
k 46
 
< 0.1%
e 42
 
< 0.1%
d 40
 
< 0.1%
Other values (6) 28
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
B 33
60.0%
U 8
 
14.5%
F 7
 
12.7%
S 5
 
9.1%
P 1
 
1.8%
H 1
 
1.8%
Space Separator
ValueCountFrequency (%)
40
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 127968
> 99.9%
Common 41
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 42517
33.2%
i 42444
33.2%
p 42443
33.2%
c 119
 
0.1%
t 99
 
0.1%
a 73
 
0.1%
o 62
 
< 0.1%
k 46
 
< 0.1%
e 42
 
< 0.1%
d 40
 
< 0.1%
Other values (12) 83
 
0.1%
Common
ValueCountFrequency (%)
40
97.6%
- 1
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 128009
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 42517
33.2%
i 42444
33.2%
p 42443
33.2%
c 119
 
0.1%
t 99
 
0.1%
a 73
 
0.1%
o 62
 
< 0.1%
k 46
 
< 0.1%
e 42
 
< 0.1%
40
 
< 0.1%
Other values (14) 124
 
0.1%

Cell_flexible
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
41425 
True
 
1072
ValueCountFrequency (%)
False 41425
97.5%
True 1072
 
2.5%
2023-05-05T12:07:58.982937image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Module
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
42151 
True
 
346
ValueCountFrequency (%)
False 42151
99.2%
True 346
 
0.8%
2023-05-05T12:07:59.123908image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Module_area_total
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct81
Distinct (%)24.4%
Missing42165
Missing (%)99.2%
Infinite0
Infinite (%)0.0%
Mean39.132743
Minimum0.00024
Maximum435
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:07:59.275640image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.00024
5-th percentile0.25
Q12.6775
median13.65
Q336.1
95-th percentile122.275
Maximum435
Range434.99976
Interquartile range (IQR)33.4225

Descriptive statistics

Standard deviation76.079615
Coefficient of variation (CV)1.9441422
Kurtosis17.122163
Mean39.132743
Median Absolute Deviation (MAD)12.65
Skewness3.9701825
Sum12992.071
Variance5788.1079
MonotonicityNot monotonic
2023-05-05T12:07:59.453638image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.25 40
 
0.1%
25 35
 
0.1%
100 25
 
0.1%
12 19
 
< 0.1%
0.5024 12
 
< 0.1%
10 12
 
< 0.1%
10.8 10
 
< 0.1%
16 10
 
< 0.1%
36.1 9
 
< 0.1%
1 9
 
< 0.1%
Other values (71) 151
 
0.4%
(Missing) 42165
99.2%
ValueCountFrequency (%)
0.00024 1
 
< 0.1%
0.000288 1
 
< 0.1%
0.00031 1
 
< 0.1%
0.00096 1
 
< 0.1%
0.09 1
 
< 0.1%
0.12 5
 
< 0.1%
0.2 1
 
< 0.1%
0.24 1
 
< 0.1%
0.25 40
0.1%
0.5024 12
 
< 0.1%
ValueCountFrequency (%)
435 8
 
< 0.1%
354.45 2
 
< 0.1%
231.04 1
 
< 0.1%
168.75 1
 
< 0.1%
156.25 4
 
< 0.1%
149.5 1
 
< 0.1%
100 25
0.1%
91.8 2
 
< 0.1%
80.55 8
 
< 0.1%
80 1
 
< 0.1%

Substrate_stack_sequence
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct194
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
SLG | FTO
26261 
SLG | ITO
14856 
PET | ITO
 
374
PEN | ITO
 
233
PET | IZO
 
62
Other values (189)
 
711

Length

Max length90
Median length9
Mean length9.062875
Min length2

Characters and Unicode

Total characters385145
Distinct characters58
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique84 ?
Unique (%)0.2%

Sample

1st rowSLG | FTO
2nd rowSLG | FTO
3rd rowSLG | FTO
4th rowSLG | FTO
5th rowSLG | FTO

Common Values

ValueCountFrequency (%)
SLG | FTO 26261
61.8%
SLG | ITO 14856
35.0%
PET | ITO 374
 
0.9%
PEN | ITO 233
 
0.5%
PET | IZO 62
 
0.1%
SLG | AZO 45
 
0.1%
SLG 39
 
0.1%
Ti-foil 30
 
0.1%
PET 30
 
0.1%
PET | Ag-grid 28
 
0.1%
Other values (184) 539
 
1.3%

Length

2023-05-05T12:07:59.636953image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
42524
33.3%
slg 41391
32.4%
fto 26277
20.6%
ito 15558
 
12.2%
pet 581
 
0.5%
pen 271
 
0.2%
azo 99
 
0.1%
graphene 98
 
0.1%
izo 73
 
0.1%
ag-nw 50
 
< 0.1%
Other values (124) 708
 
0.6%

Most occurring characters

ValueCountFrequency (%)
85133
22.1%
T 42625
11.1%
| 42524
11.0%
O 42173
10.9%
S 41558
10.8%
G 41525
10.8%
L 41395
10.7%
F 26298
 
6.8%
I 15666
 
4.1%
P 1005
 
0.3%
Other values (48) 5243
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 254376
66.0%
Space Separator 85133
 
22.1%
Math Symbol 42524
 
11.0%
Lowercase Letter 2634
 
0.7%
Dash Punctuation 223
 
0.1%
Decimal Number 168
 
< 0.1%
Other Punctuation 87
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 322
12.2%
i 294
11.2%
n 289
11.0%
r 215
 
8.2%
g 198
 
7.5%
a 184
 
7.0%
l 164
 
6.2%
o 141
 
5.4%
p 134
 
5.1%
h 120
 
4.6%
Other values (13) 573
21.8%
Uppercase Letter
ValueCountFrequency (%)
T 42625
16.8%
O 42173
16.6%
S 41558
16.3%
G 41525
16.3%
L 41395
16.3%
F 26298
10.3%
I 15666
 
6.2%
P 1005
 
0.4%
E 933
 
0.4%
A 371
 
0.1%
Other values (12) 827
 
0.3%
Decimal Number
ValueCountFrequency (%)
3 60
35.7%
2 55
32.7%
0 18
 
10.7%
6 15
 
8.9%
1 7
 
4.2%
8 7
 
4.2%
4 5
 
3.0%
5 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
: 45
51.7%
; 42
48.3%
Space Separator
ValueCountFrequency (%)
85133
100.0%
Math Symbol
ValueCountFrequency (%)
| 42524
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 223
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 257010
66.7%
Common 128135
33.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 42625
16.6%
O 42173
16.4%
S 41558
16.2%
G 41525
16.2%
L 41395
16.1%
F 26298
10.2%
I 15666
 
6.1%
P 1005
 
0.4%
E 933
 
0.4%
A 371
 
0.1%
Other values (35) 3461
 
1.3%
Common
ValueCountFrequency (%)
85133
66.4%
| 42524
33.2%
- 223
 
0.2%
3 60
 
< 0.1%
2 55
 
< 0.1%
: 45
 
< 0.1%
; 42
 
< 0.1%
0 18
 
< 0.1%
6 15
 
< 0.1%
1 7
 
< 0.1%
Other values (3) 13
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 385145
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
85133
22.1%
T 42625
11.1%
| 42524
11.0%
O 42173
10.9%
S 41558
10.8%
G 41525
10.8%
L 41395
10.7%
F 26298
 
6.8%
I 15666
 
4.1%
P 1005
 
0.3%
Other values (48) 5243
 
1.4%

ETL_stack_sequence
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct1444
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
TiO2-c | TiO2-mp
11705 
TiO2-c
6882 
PCBM-60
3281 
PCBM-60 | BCP
2662 
SnO2-np
1652 
Other values (1439)
16315 

Length

Max length190
Median length185
Mean length12.302963
Min length2

Characters and Unicode

Total characters522839
Distinct characters80
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique532 ?
Unique (%)1.3%

Sample

1st rowTiO2-c | TiO2-mp
2nd rowTiO2-c | TiO2-mp
3rd rowTiO2-c | TiO2-mp
4th rowTiO2-c | TiO2-mp
5th rowTiO2-c | TiO2-mp

Common Values

ValueCountFrequency (%)
TiO2-c | TiO2-mp 11705
27.5%
TiO2-c 6882
16.2%
PCBM-60 3281
 
7.7%
PCBM-60 | BCP 2662
 
6.3%
SnO2-np 1652
 
3.9%
C60 | BCP 1508
 
3.5%
SnO2-c 1399
 
3.3%
TiO2-c | TiO2-mp | ZrO2-mp 644
 
1.5%
ZnO-c 484
 
1.1%
PCBM-60 | C60 | BCP 461
 
1.1%
Other values (1434) 11819
27.8%

Length

2023-05-05T12:07:59.848614image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
28790
28.5%
tio2-c 22073
21.8%
tio2-mp 13409
13.3%
pcbm-60 10553
 
10.4%
bcp 5249
 
5.2%
c60 3548
 
3.5%
sno2-c 2267
 
2.2%
sno2-np 2070
 
2.0%
zno-c 1080
 
1.1%
zno-np 869
 
0.9%
Other values (875) 11217
 
11.1%

Most occurring characters

ValueCountFrequency (%)
58628
11.2%
- 57921
11.1%
O 46434
 
8.9%
2 43503
 
8.3%
i 39041
 
7.5%
T 37804
 
7.2%
| 28778
 
5.5%
c 26248
 
5.0%
C 21542
 
4.1%
p 19495
 
3.7%
Other values (70) 143445
27.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 171201
32.7%
Lowercase Letter 128149
24.5%
Decimal Number 76479
14.6%
Space Separator 58628
 
11.2%
Dash Punctuation 57965
 
11.1%
Math Symbol 28778
 
5.5%
Other Punctuation 1004
 
0.2%
Close Punctuation 317
 
0.1%
Open Punctuation 317
 
0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 46434
27.1%
T 37804
22.1%
C 21542
12.6%
P 18213
 
10.6%
B 17341
 
10.1%
M 11500
 
6.7%
S 5669
 
3.3%
Z 3800
 
2.2%
A 1802
 
1.1%
I 1427
 
0.8%
Other values (16) 5669
 
3.3%
Lowercase Letter
ValueCountFrequency (%)
i 39041
30.5%
c 26248
20.5%
p 19495
15.2%
m 15436
 
12.0%
n 14197
 
11.1%
e 2193
 
1.7%
r 1733
 
1.4%
o 1320
 
1.0%
a 1311
 
1.0%
h 1274
 
1.0%
Other values (15) 5901
 
4.6%
Decimal Number
ValueCountFrequency (%)
2 43503
56.9%
0 15396
 
20.1%
6 14774
 
19.3%
3 1069
 
1.4%
1 595
 
0.8%
7 413
 
0.5%
4 397
 
0.5%
5 203
 
0.3%
8 91
 
0.1%
9 38
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
; 643
64.0%
, 141
 
14.0%
: 86
 
8.6%
@ 60
 
6.0%
. 37
 
3.7%
24
 
2.4%
/ 8
 
0.8%
* 5
 
0.5%
Close Punctuation
ValueCountFrequency (%)
) 282
89.0%
] 31
 
9.8%
} 4
 
1.3%
Open Punctuation
ValueCountFrequency (%)
( 282
89.0%
[ 31
 
9.8%
{ 4
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
- 57921
99.9%
44
 
0.1%
Space Separator
ValueCountFrequency (%)
58628
100.0%
Math Symbol
ValueCountFrequency (%)
| 28778
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 299350
57.3%
Common 223489
42.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 46434
15.5%
i 39041
13.0%
T 37804
12.6%
c 26248
8.8%
C 21542
7.2%
p 19495
6.5%
P 18213
 
6.1%
B 17341
 
5.8%
m 15436
 
5.2%
n 14197
 
4.7%
Other values (41) 43599
14.6%
Common
ValueCountFrequency (%)
58628
26.2%
- 57921
25.9%
2 43503
19.5%
| 28778
12.9%
0 15396
 
6.9%
6 14774
 
6.6%
3 1069
 
0.5%
; 643
 
0.3%
1 595
 
0.3%
7 413
 
0.2%
Other values (19) 1769
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 522771
> 99.9%
Punctuation 68
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
58628
11.2%
- 57921
11.1%
O 46434
 
8.9%
2 43503
 
8.3%
i 39041
 
7.5%
T 37804
 
7.2%
| 28778
 
5.5%
c 26248
 
5.0%
C 21542
 
4.1%
p 19495
 
3.7%
Other values (68) 143377
27.4%
Punctuation
ValueCountFrequency (%)
44
64.7%
24
35.3%

ETL_thickness
Categorical

HIGH CARDINALITY  MISSING 

Distinct1419
Distinct (%)7.6%
Missing23933
Missing (%)56.3%
Memory size332.1 KiB
50.0
 
1111
40.0
 
906
30.0
 
631
60.0
 
522
20.0
 
420
Other values (1414)
14974 

Length

Max length46
Median length32
Mean length9.3584357
Min length3

Characters and Unicode

Total characters173730
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique543 ?
Unique (%)2.9%

Sample

1st row65.0 | nan
2nd row65.0 | nan
3rd row65.0 | nan
4th row65.0 | nan
5th row65.0 | nan

Common Values

ValueCountFrequency (%)
50.0 1111
 
2.6%
40.0 906
 
2.1%
30.0 631
 
1.5%
60.0 522
 
1.2%
20.0 420
 
1.0%
nan | 200.0 367
 
0.9%
30.0 | nan 344
 
0.8%
80.0 313
 
0.7%
50.0 | nan 313
 
0.7%
100.0 303
 
0.7%
Other values (1409) 13334
31.4%
(Missing) 23933
56.3%

Length

2023-05-05T12:08:00.038700image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
14370
30.4%
nan 6057
12.8%
50.0 2963
 
6.3%
30.0 2519
 
5.3%
40.0 2073
 
4.4%
20.0 1844
 
3.9%
60.0 1275
 
2.7%
8.0 1170
 
2.5%
200.0 1170
 
2.5%
10.0 1163
 
2.5%
Other values (374) 12700
26.8%

Most occurring characters

ValueCountFrequency (%)
0 52415
30.2%
28740
16.5%
. 26873
15.5%
| 14370
 
8.3%
n 12122
 
7.0%
5 8225
 
4.7%
a 6057
 
3.5%
1 5641
 
3.2%
2 5097
 
2.9%
3 4300
 
2.5%
Other values (9) 9890
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 85552
49.2%
Space Separator 28740
 
16.5%
Other Punctuation 26873
 
15.5%
Lowercase Letter 18195
 
10.5%
Math Symbol 14370
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 52415
61.3%
5 8225
 
9.6%
1 5641
 
6.6%
2 5097
 
6.0%
3 4300
 
5.0%
4 3384
 
4.0%
8 2399
 
2.8%
6 2251
 
2.6%
7 1431
 
1.7%
9 409
 
0.5%
Lowercase Letter
ValueCountFrequency (%)
n 12122
66.6%
a 6057
33.3%
u 4
 
< 0.1%
k 4
 
< 0.1%
o 4
 
< 0.1%
w 4
 
< 0.1%
Space Separator
ValueCountFrequency (%)
28740
100.0%
Other Punctuation
ValueCountFrequency (%)
. 26873
100.0%
Math Symbol
ValueCountFrequency (%)
| 14370
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 155535
89.5%
Latin 18195
 
10.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 52415
33.7%
28740
18.5%
. 26873
17.3%
| 14370
 
9.2%
5 8225
 
5.3%
1 5641
 
3.6%
2 5097
 
3.3%
3 4300
 
2.8%
4 3384
 
2.2%
8 2399
 
1.5%
Other values (3) 4091
 
2.6%
Latin
ValueCountFrequency (%)
n 12122
66.6%
a 6057
33.3%
u 4
 
< 0.1%
k 4
 
< 0.1%
o 4
 
< 0.1%
w 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 173730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 52415
30.2%
28740
16.5%
. 26873
15.5%
| 14370
 
8.3%
n 12122
 
7.0%
5 8225
 
4.7%
a 6057
 
3.5%
1 5641
 
3.2%
2 5097
 
2.9%
3 4300
 
2.5%
Other values (9) 9890
 
5.7%

ETL_additives_compounds
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct447
Distinct (%)1.1%
Missing3293
Missing (%)7.7%
Memory size332.1 KiB
Unknown
33689 
Unknown | TiCl4
 
1000
Undoped
 
530
Unknown | Li-TFSI
 
529
TiCl4
 
472
Other values (442)
 
2984

Length

Max length51
Median length7
Mean length7.7830323
Min length1

Characters and Unicode

Total characters305126
Distinct characters73
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique132 ?
Unique (%)0.3%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 33689
79.3%
Unknown | TiCl4 1000
 
2.4%
Undoped 530
 
1.2%
Unknown | Li-TFSI 529
 
1.2%
TiCl4 472
 
1.1%
TiCl4 | Unknown 334
 
0.8%
Undoped | Undoped 313
 
0.7%
Nb 78
 
0.2%
nan | TiCl4 75
 
0.2%
Li-TFSI 45
 
0.1%
Other values (437) 2139
 
5.0%
(Missing) 3293
 
7.7%

Length

2023-05-05T12:08:00.238980image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 36562
77.7%
3704
 
7.9%
ticl4 2206
 
4.7%
undoped 1366
 
2.9%
li-tfsi 688
 
1.5%
nan 219
 
0.5%
nb 151
 
0.3%
graphene 58
 
0.1%
mg 55
 
0.1%
in 45
 
0.1%
Other values (302) 1972
 
4.2%

Most occurring characters

ValueCountFrequency (%)
n 112111
36.7%
o 38274
 
12.5%
U 37959
 
12.4%
w 36568
 
12.0%
k 36562
 
12.0%
7809
 
2.6%
| 3704
 
1.2%
i 3495
 
1.1%
T 3190
 
1.0%
d 2891
 
0.9%
Other values (63) 22563
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 239385
78.5%
Uppercase Letter 49818
 
16.3%
Space Separator 7822
 
2.6%
Math Symbol 3704
 
1.2%
Decimal Number 2974
 
1.0%
Dash Punctuation 1001
 
0.3%
Other Punctuation 306
 
0.1%
Close Punctuation 58
 
< 0.1%
Open Punctuation 58
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 37959
76.2%
T 3190
 
6.4%
C 2662
 
5.3%
S 964
 
1.9%
I 938
 
1.9%
F 853
 
1.7%
L 785
 
1.6%
N 416
 
0.8%
A 365
 
0.7%
P 267
 
0.5%
Other values (15) 1419
 
2.8%
Lowercase Letter
ValueCountFrequency (%)
n 112111
46.8%
o 38274
 
16.0%
w 36568
 
15.3%
k 36562
 
15.3%
i 3495
 
1.5%
d 2891
 
1.2%
l 2718
 
1.1%
e 2030
 
0.8%
p 1608
 
0.7%
a 762
 
0.3%
Other values (14) 2366
 
1.0%
Decimal Number
ValueCountFrequency (%)
4 2288
76.9%
2 259
 
8.7%
3 174
 
5.9%
0 88
 
3.0%
1 60
 
2.0%
6 55
 
1.8%
5 29
 
1.0%
9 10
 
0.3%
7 8
 
0.3%
8 3
 
0.1%
Other Punctuation
ValueCountFrequency (%)
; 198
64.7%
: 39
 
12.7%
@ 29
 
9.5%
· 14
 
4.6%
, 10
 
3.3%
. 6
 
2.0%
* 5
 
1.6%
5
 
1.6%
Space Separator
ValueCountFrequency (%)
7809
99.8%
  13
 
0.2%
Math Symbol
ValueCountFrequency (%)
| 3704
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1001
100.0%
Close Punctuation
ValueCountFrequency (%)
) 58
100.0%
Open Punctuation
ValueCountFrequency (%)
( 58
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 289203
94.8%
Common 15923
 
5.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 112111
38.8%
o 38274
 
13.2%
U 37959
 
13.1%
w 36568
 
12.6%
k 36562
 
12.6%
i 3495
 
1.2%
T 3190
 
1.1%
d 2891
 
1.0%
l 2718
 
0.9%
C 2662
 
0.9%
Other values (39) 12773
 
4.4%
Common
ValueCountFrequency (%)
7809
49.0%
| 3704
23.3%
4 2288
 
14.4%
- 1001
 
6.3%
2 259
 
1.6%
; 198
 
1.2%
3 174
 
1.1%
0 88
 
0.6%
1 60
 
0.4%
) 58
 
0.4%
Other values (14) 284
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 305094
> 99.9%
None 27
 
< 0.1%
Punctuation 5
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 112111
36.7%
o 38274
 
12.5%
U 37959
 
12.4%
w 36568
 
12.0%
k 36562
 
12.0%
7809
 
2.6%
| 3704
 
1.2%
i 3495
 
1.1%
T 3190
 
1.0%
d 2891
 
0.9%
Other values (60) 22531
 
7.4%
None
ValueCountFrequency (%)
· 14
51.9%
  13
48.1%
Punctuation
ValueCountFrequency (%)
5
100.0%

ETL_deposition_procedure
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct368
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Spin-coating
12263 
Spin-coating | Spin-coating
11725 
Spray-pyrolys | Spin-coating
4655 
Evaporation | Evaporation
1715 
Spin-coating | Evaporation
1496 
Other values (363)
10643 

Length

Max length155
Median length102
Mean length21.915618
Min length3

Characters and Unicode

Total characters931348
Distinct characters47
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)0.2%

Sample

1st rowSpray-pyrolys | Spin-coating
2nd rowSpray-pyrolys | Spin-coating
3rd rowSpray-pyrolys | Spin-coating
4th rowSpray-pyrolys | Spin-coating
5th rowSpray-pyrolys | Spin-coating

Common Values

ValueCountFrequency (%)
Spin-coating 12263
28.9%
Spin-coating | Spin-coating 11725
27.6%
Spray-pyrolys | Spin-coating 4655
 
11.0%
Evaporation | Evaporation 1715
 
4.0%
Spin-coating | Evaporation 1496
 
3.5%
CBD 1127
 
2.7%
Spray-pyrolys 923
 
2.2%
Spin-coating | Evaporation | Evaporation 722
 
1.7%
Unknown 685
 
1.6%
Spin-coating | Spin-coating | Spin-coating 531
 
1.2%
Other values (358) 6655
15.7%

Length

2023-05-05T12:08:00.443597image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
spin-coating 48497
46.5%
29381
28.2%
evaporation 7590
 
7.3%
spray-pyrolys 6645
 
6.4%
cbd 2070
 
2.0%
printing 1729
 
1.7%
screen 1728
 
1.7%
hydrothermal 1025
 
1.0%
unknown 902
 
0.9%
ald 810
 
0.8%
Other values (86) 3918
 
3.8%

Most occurring characters

ValueCountFrequency (%)
n 115684
12.4%
i 111704
12.0%
o 75063
 
8.1%
a 73557
 
7.9%
p 73518
 
7.9%
t 62756
 
6.7%
61798
 
6.6%
S 57628
 
6.2%
- 56110
 
6.0%
g 52539
 
5.6%
Other values (37) 190991
20.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 705136
75.7%
Uppercase Letter 78144
 
8.4%
Space Separator 61798
 
6.6%
Dash Punctuation 56110
 
6.0%
Math Symbol 30160
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 115684
16.4%
i 111704
15.8%
o 75063
10.6%
a 73557
10.4%
p 73518
10.4%
t 62756
8.9%
g 52539
7.5%
c 51839
7.4%
r 28493
 
4.0%
y 21197
 
3.0%
Other values (16) 38786
 
5.5%
Uppercase Letter
ValueCountFrequency (%)
S 57628
73.7%
E 7963
 
10.2%
D 3888
 
5.0%
C 2199
 
2.8%
B 2085
 
2.7%
H 1033
 
1.3%
U 921
 
1.2%
L 853
 
1.1%
A 842
 
1.1%
M 226
 
0.3%
Other values (7) 506
 
0.6%
Math Symbol
ValueCountFrequency (%)
| 28602
94.8%
> 1558
 
5.2%
Space Separator
ValueCountFrequency (%)
61798
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 56110
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 783280
84.1%
Common 148068
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 115684
14.8%
i 111704
14.3%
o 75063
9.6%
a 73557
9.4%
p 73518
9.4%
t 62756
8.0%
S 57628
7.4%
g 52539
6.7%
c 51839
6.6%
r 28493
 
3.6%
Other values (33) 80499
10.3%
Common
ValueCountFrequency (%)
61798
41.7%
- 56110
37.9%
| 28602
19.3%
> 1558
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 931348
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 115684
12.4%
i 111704
12.0%
o 75063
 
8.1%
a 73557
 
7.9%
p 73518
 
7.9%
t 62756
 
6.7%
61798
 
6.6%
S 57628
 
6.2%
- 56110
 
6.0%
g 52539
 
5.6%
Other values (37) 190991
20.5%

ETL_surface_treatment_before_next_deposition_step
Categorical

HIGH CORRELATION  MISSING 

Distinct13
Distinct (%)7.8%
Missing42330
Missing (%)99.6%
Memory size332.1 KiB
UV-Ozone
108 
Plasma
12 
UV
11 
Ozone
 
10
O2 plasma
 
5
Other values (8)
21 

Length

Max length30
Median length8
Mean length8.1616766
Min length2

Characters and Unicode

Total characters1363
Distinct characters34
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUV
2nd rowUV
3rd rowOzone
4th rowPlasma
5th rowWater

Common Values

ValueCountFrequency (%)
UV-Ozone 108
 
0.3%
Plasma 12
 
< 0.1%
UV 11
 
< 0.1%
Ozone 10
 
< 0.1%
O2 plasma 5
 
< 0.1%
ZnAl-LDH and thermal annealing 4
 
< 0.1%
Wash with IPA 4
 
< 0.1%
Washed with methanol 3
 
< 0.1%
Water 2
 
< 0.1%
CO2 2
 
< 0.1%
Other values (3) 6
 
< 0.1%
(Missing) 42330
99.6%

Length

2023-05-05T12:08:00.612941image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
uv-ozone 108
52.9%
plasma 19
 
9.3%
uv 11
 
5.4%
ozone 10
 
4.9%
with 7
 
3.4%
o2 5
 
2.5%
znal-ldh 4
 
2.0%
and 4
 
2.0%
thermal 4
 
2.0%
annealing 4
 
2.0%
Other values (11) 28
 
13.7%

Most occurring characters

ValueCountFrequency (%)
n 145
10.6%
e 142
10.4%
O 125
9.2%
o 123
9.0%
U 119
8.7%
V 119
8.7%
z 118
8.7%
- 112
8.2%
a 68
 
5.0%
37
 
2.7%
Other values (24) 255
18.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 781
57.3%
Uppercase Letter 424
31.1%
Dash Punctuation 112
 
8.2%
Space Separator 37
 
2.7%
Decimal Number 9
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 145
18.6%
e 142
18.2%
o 123
15.7%
z 118
15.1%
a 68
8.7%
l 34
 
4.4%
s 26
 
3.3%
m 26
 
3.3%
h 23
 
2.9%
t 20
 
2.6%
Other values (8) 56
 
7.2%
Uppercase Letter
ValueCountFrequency (%)
O 125
29.5%
U 119
28.1%
V 119
28.1%
P 16
 
3.8%
W 9
 
2.1%
H 8
 
1.9%
A 8
 
1.9%
D 4
 
0.9%
L 4
 
0.9%
Z 4
 
0.9%
Other values (3) 8
 
1.9%
Dash Punctuation
ValueCountFrequency (%)
- 112
100.0%
Space Separator
ValueCountFrequency (%)
37
100.0%
Decimal Number
ValueCountFrequency (%)
2 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1205
88.4%
Common 158
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 145
12.0%
e 142
11.8%
O 125
10.4%
o 123
10.2%
U 119
9.9%
V 119
9.9%
z 118
9.8%
a 68
 
5.6%
l 34
 
2.8%
s 26
 
2.2%
Other values (21) 186
15.4%
Common
ValueCountFrequency (%)
- 112
70.9%
37
 
23.4%
2 9
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1363
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 145
10.6%
e 142
10.4%
O 125
9.2%
o 123
9.0%
U 119
8.7%
V 119
8.7%
z 118
8.7%
- 112
8.2%
a 68
 
5.0%
37
 
2.7%
Other values (24) 255
18.7%

Perovskite_dimension_2D
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
41493 
True
 
1004
ValueCountFrequency (%)
False 41493
97.6%
True 1004
 
2.4%
2023-05-05T12:08:00.749361image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Perovskite_dimension_3D
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
True
41145 
False
 
1352
ValueCountFrequency (%)
True 41145
96.8%
False 1352
 
3.2%
2023-05-05T12:08:00.881187image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Perovskite_dimension_3D_with_2D_capping_layer
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
42307 
True
 
190
ValueCountFrequency (%)
False 42307
99.6%
True 190
 
0.4%
2023-05-05T12:08:01.009456image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Perovskite_composition_perovskite_ABC3_structure
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
True
41709 
False
 
788
ValueCountFrequency (%)
True 41709
98.1%
False 788
 
1.9%
2023-05-05T12:08:01.140711image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Perovskite_composition_a_ions
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct264
Distinct (%)0.6%
Missing31
Missing (%)0.1%
Memory size332.1 KiB
MA
28620 
FA; MA
3985 
Cs; FA; MA
3823 
Cs
 
2070
Cs; FA
 
1061
Other values (259)
2907 

Length

Max length26
Median length2
Mean length3.493642
Min length1

Characters and Unicode

Total characters148361
Distinct characters58
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)0.2%

Sample

1st rowCs
2nd rowCs
3rd rowCs
4th rowCs
5th rowCs

Common Values

ValueCountFrequency (%)
MA 28620
67.3%
FA; MA 3985
 
9.4%
Cs; FA; MA 3823
 
9.0%
Cs 2070
 
4.9%
Cs; FA 1061
 
2.5%
FA 991
 
2.3%
BA; MA 218
 
0.5%
Cs; MA 132
 
0.3%
(PEA); MA 130
 
0.3%
GU; MA 56
 
0.1%
Other values (254) 1380
 
3.2%

Length

2023-05-05T12:08:01.483167image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ma 37756
65.4%
fa 10382
 
18.0%
cs 7590
 
13.1%
ba 376
 
0.7%
277
 
0.5%
pea 271
 
0.5%
ag 116
 
0.2%
gu 107
 
0.2%
5-ava 55
 
0.1%
ha 48
 
0.1%
Other values (118) 776
 
1.3%

Most occurring characters

ValueCountFrequency (%)
A 49583
33.4%
M 37913
25.6%
15288
 
10.3%
; 14734
 
9.9%
F 10436
 
7.0%
C 7657
 
5.2%
s 7592
 
5.1%
( 867
 
0.6%
) 867
 
0.6%
P 523
 
0.4%
Other values (48) 2901
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 108037
72.8%
Space Separator 15288
 
10.3%
Other Punctuation 14736
 
9.9%
Lowercase Letter 7965
 
5.4%
Open Punctuation 867
 
0.6%
Close Punctuation 867
 
0.6%
Math Symbol 277
 
0.2%
Decimal Number 250
 
0.2%
Dash Punctuation 74
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 49583
45.9%
M 37913
35.1%
F 10436
 
9.7%
C 7657
 
7.1%
P 523
 
0.5%
B 483
 
0.4%
E 464
 
0.4%
H 177
 
0.2%
D 128
 
0.1%
G 124
 
0.1%
Other values (12) 549
 
0.5%
Lowercase Letter
ValueCountFrequency (%)
s 7592
95.3%
g 116
 
1.5%
b 44
 
0.6%
h 31
 
0.4%
a 28
 
0.4%
y 27
 
0.3%
i 21
 
0.3%
m 20
 
0.3%
r 14
 
0.2%
n 13
 
0.2%
Other values (10) 59
 
0.7%
Decimal Number
ValueCountFrequency (%)
3 68
27.2%
4 65
26.0%
5 64
25.6%
1 24
 
9.6%
6 13
 
5.2%
2 11
 
4.4%
7 2
 
0.8%
9 2
 
0.8%
8 1
 
0.4%
Other Punctuation
ValueCountFrequency (%)
; 14734
> 99.9%
. 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
15288
100.0%
Open Punctuation
ValueCountFrequency (%)
( 867
100.0%
Close Punctuation
ValueCountFrequency (%)
) 867
100.0%
Math Symbol
ValueCountFrequency (%)
| 277
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 74
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 116002
78.2%
Common 32359
 
21.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 49583
42.7%
M 37913
32.7%
F 10436
 
9.0%
C 7657
 
6.6%
s 7592
 
6.5%
P 523
 
0.5%
B 483
 
0.4%
E 464
 
0.4%
H 177
 
0.2%
D 128
 
0.1%
Other values (32) 1046
 
0.9%
Common
ValueCountFrequency (%)
15288
47.2%
; 14734
45.5%
( 867
 
2.7%
) 867
 
2.7%
| 277
 
0.9%
- 74
 
0.2%
3 68
 
0.2%
4 65
 
0.2%
5 64
 
0.2%
1 24
 
0.1%
Other values (6) 31
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 148361
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 49583
33.4%
M 37913
25.6%
15288
 
10.3%
; 14734
 
9.9%
F 10436
 
7.0%
C 7657
 
5.2%
s 7592
 
5.1%
( 867
 
0.6%
) 867
 
0.6%
P 523
 
0.4%
Other values (48) 2901
 
2.0%

Perovskite_composition_a_ions_coefficients
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct565
Distinct (%)1.3%
Missing59
Missing (%)0.1%
Memory size332.1 KiB
1
31551 
0.05; 0.79; 0.16
 
1555
0.85; 0.15
 
1472
0.83; 0.17
 
556
0.05; 0.81; 0.14
 
474
Other values (560)
6830 

Length

Max length29
Median length1
Mean length3.6501956
Min length1

Characters and Unicode

Total characters154907
Distinct characters15
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique190 ?
Unique (%)0.4%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 31551
74.2%
0.05; 0.79; 0.16 1555
 
3.7%
0.85; 0.15 1472
 
3.5%
0.83; 0.17 556
 
1.3%
0.05; 0.81; 0.14 474
 
1.1%
0.1; 0.9 318
 
0.7%
0.17; 0.83 281
 
0.7%
0.2; 0.8 247
 
0.6%
0.3; 0.7 227
 
0.5%
0.15; 0.85 218
 
0.5%
Other values (555) 5539
 
13.0%

Length

2023-05-05T12:08:01.652786image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 32280
56.0%
0.05 3352
 
5.8%
0.15 2390
 
4.1%
0.85 1830
 
3.2%
0.16 1782
 
3.1%
0.79 1700
 
2.9%
2 1034
 
1.8%
0.1 995
 
1.7%
0.83 964
 
1.7%
0.17 924
 
1.6%
Other values (264) 10406
 
18.0%

Most occurring characters

ValueCountFrequency (%)
1 40940
26.4%
0 27528
17.8%
. 23221
15.0%
15219
 
9.8%
; 14665
 
9.5%
5 10242
 
6.6%
8 5304
 
3.4%
7 4734
 
3.1%
9 3069
 
2.0%
6 2942
 
1.9%
Other values (5) 7043
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 101479
65.5%
Other Punctuation 37886
 
24.5%
Space Separator 15219
 
9.8%
Math Symbol 277
 
0.2%
Lowercase Letter 46
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 40940
40.3%
0 27528
27.1%
5 10242
 
10.1%
8 5304
 
5.2%
7 4734
 
4.7%
9 3069
 
3.0%
6 2942
 
2.9%
2 2663
 
2.6%
3 2434
 
2.4%
4 1623
 
1.6%
Other Punctuation
ValueCountFrequency (%)
. 23221
61.3%
; 14665
38.7%
Space Separator
ValueCountFrequency (%)
15219
100.0%
Math Symbol
ValueCountFrequency (%)
| 277
100.0%
Lowercase Letter
ValueCountFrequency (%)
x 46
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 154861
> 99.9%
Latin 46
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 40940
26.4%
0 27528
17.8%
. 23221
15.0%
15219
 
9.8%
; 14665
 
9.5%
5 10242
 
6.6%
8 5304
 
3.4%
7 4734
 
3.1%
9 3069
 
2.0%
6 2942
 
1.9%
Other values (4) 6997
 
4.5%
Latin
ValueCountFrequency (%)
x 46
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 154907
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 40940
26.4%
0 27528
17.8%
. 23221
15.0%
15219
 
9.8%
; 14665
 
9.5%
5 10242
 
6.6%
8 5304
 
3.4%
7 4734
 
3.1%
9 3069
 
2.0%
6 2942
 
1.9%
Other values (5) 7043
 
4.5%

Perovskite_composition_b_ions
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct53
Distinct (%)0.1%
Missing5
Missing (%)< 0.1%
Memory size332.1 KiB
Pb
40415 
Sn
 
636
Pb; Sn
 
499
Bi
 
263
Pb | Pb
 
246
Other values (48)
 
433

Length

Max length22
Median length2
Mean length2.1077615
Min length1

Characters and Unicode

Total characters89563
Distinct characters30
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)< 0.1%

Sample

1st rowSn
2nd rowSn
3rd rowSn
4th rowSn
5th rowSn

Common Values

ValueCountFrequency (%)
Pb 40415
95.1%
Sn 636
 
1.5%
Pb; Sn 499
 
1.2%
Bi 263
 
0.6%
Pb | Pb 246
 
0.6%
Sb 84
 
0.2%
Ag; Pb 35
 
0.1%
Pb; Sb 34
 
0.1%
Pb; Sr 20
 
< 0.1%
Ba; Pb 19
 
< 0.1%
Other values (43) 241
 
0.6%

Length

2023-05-05T12:08:01.801319image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pb 41696
95.1%
sn 1158
 
2.6%
bi 305
 
0.7%
277
 
0.6%
sb 131
 
0.3%
ag 44
 
0.1%
cu 28
 
0.1%
ge 21
 
< 0.1%
ba 20
 
< 0.1%
sr 20
 
< 0.1%
Other values (20) 145
 
0.3%

Most occurring characters

ValueCountFrequency (%)
b 41835
46.7%
P 41696
46.6%
1353
 
1.5%
S 1310
 
1.5%
n 1187
 
1.3%
; 799
 
0.9%
B 325
 
0.4%
i 320
 
0.4%
| 277
 
0.3%
g 58
 
0.1%
Other values (20) 403
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 43568
48.6%
Lowercase Letter 43566
48.6%
Space Separator 1353
 
1.5%
Other Punctuation 799
 
0.9%
Math Symbol 277
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 41696
95.7%
S 1310
 
3.0%
B 325
 
0.7%
C 54
 
0.1%
A 46
 
0.1%
G 21
 
< 0.1%
F 19
 
< 0.1%
T 18
 
< 0.1%
N 16
 
< 0.1%
M 16
 
< 0.1%
Other values (6) 47
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
b 41835
96.0%
n 1187
 
2.7%
i 320
 
0.7%
g 58
 
0.1%
e 51
 
0.1%
u 41
 
0.1%
a 36
 
0.1%
r 21
 
< 0.1%
o 15
 
< 0.1%
m 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1353
100.0%
Other Punctuation
ValueCountFrequency (%)
; 799
100.0%
Math Symbol
ValueCountFrequency (%)
| 277
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 87134
97.3%
Common 2429
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
b 41835
48.0%
P 41696
47.9%
S 1310
 
1.5%
n 1187
 
1.4%
B 325
 
0.4%
i 320
 
0.4%
g 58
 
0.1%
C 54
 
0.1%
e 51
 
0.1%
A 46
 
0.1%
Other values (17) 252
 
0.3%
Common
ValueCountFrequency (%)
1353
55.7%
; 799
32.9%
| 277
 
11.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 89563
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
b 41835
46.7%
P 41696
46.6%
1353
 
1.5%
S 1310
 
1.5%
n 1187
 
1.3%
; 799
 
0.9%
B 325
 
0.4%
i 320
 
0.4%
| 277
 
0.3%
g 58
 
0.1%
Other values (20) 403
 
0.4%

Perovskite_composition_b_ions_coefficients
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct148
Distinct (%)0.3%
Missing34
Missing (%)0.1%
Memory size332.1 KiB
1
39826 
1.0
 
462
4
 
317
2
 
247
1 | 1
 
230
Other values (143)
 
1381

Length

Max length19
Median length1
Mean length1.2025764
Min length1

Characters and Unicode

Total characters51065
Distinct characters15
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)0.1%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 39826
93.7%
1.0 462
 
1.1%
4 317
 
0.7%
2 247
 
0.6%
1 | 1 230
 
0.5%
3 226
 
0.5%
0.5; 0.5 175
 
0.4%
5 128
 
0.3%
0.75; 0.25 73
 
0.2%
0.4; 0.6 53
 
0.1%
Other values (138) 726
 
1.7%

Length

2023-05-05T12:08:01.934328image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 40411
92.2%
1.0 462
 
1.1%
0.5 351
 
0.8%
4 318
 
0.7%
277
 
0.6%
2 256
 
0.6%
3 244
 
0.6%
5 128
 
0.3%
0.25 107
 
0.2%
0.75 105
 
0.2%
Other values (130) 1158
 
2.6%

Most occurring characters

ValueCountFrequency (%)
1 41078
80.4%
0 2321
 
4.5%
. 2068
 
4.0%
1354
 
2.7%
5 933
 
1.8%
; 800
 
1.6%
2 474
 
0.9%
4 455
 
0.9%
9 432
 
0.8%
3 344
 
0.7%
Other values (5) 806
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 46565
91.2%
Other Punctuation 2868
 
5.6%
Space Separator 1354
 
2.7%
Math Symbol 277
 
0.5%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 41078
88.2%
0 2321
 
5.0%
5 933
 
2.0%
2 474
 
1.0%
4 455
 
1.0%
9 432
 
0.9%
3 344
 
0.7%
7 241
 
0.5%
6 163
 
0.4%
8 124
 
0.3%
Other Punctuation
ValueCountFrequency (%)
. 2068
72.1%
; 800
 
27.9%
Space Separator
ValueCountFrequency (%)
1354
100.0%
Math Symbol
ValueCountFrequency (%)
| 277
100.0%
Lowercase Letter
ValueCountFrequency (%)
x 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 51064
> 99.9%
Latin 1
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 41078
80.4%
0 2321
 
4.5%
. 2068
 
4.0%
1354
 
2.7%
5 933
 
1.8%
; 800
 
1.6%
2 474
 
0.9%
4 455
 
0.9%
9 432
 
0.8%
3 344
 
0.7%
Other values (4) 805
 
1.6%
Latin
ValueCountFrequency (%)
x 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51065
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 41078
80.4%
0 2321
 
4.5%
. 2068
 
4.0%
1354
 
2.7%
5 933
 
1.8%
; 800
 
1.6%
2 474
 
0.9%
4 455
 
0.9%
9 432
 
0.8%
3 344
 
0.7%
Other values (5) 806
 
1.6%

Perovskite_composition_c_ions
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct33
Distinct (%)0.1%
Missing7
Missing (%)< 0.1%
Memory size332.1 KiB
I
31966 
Br; I
9136 
Br
 
981
Br; I | I
 
87
I | I
 
86
Other values (28)
 
234

Length

Max length29
Median length1
Mean length1.9401506
Min length1

Characters and Unicode

Total characters82437
Distinct characters17
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowI
2nd rowBr; I
3rd rowBr; I
4th rowBr; I
5th rowBr

Common Values

ValueCountFrequency (%)
I 31966
75.2%
Br; I 9136
 
21.5%
Br 981
 
2.3%
Br; I | I 87
 
0.2%
I | I 86
 
0.2%
Cl 35
 
0.1%
Br; I | Br; I 33
 
0.1%
O 32
 
0.1%
Cl; I 21
 
< 0.1%
Br; Cl; I 11
 
< 0.1%
Other values (23) 102
 
0.2%

Length

2023-05-05T12:08:02.086508image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
i 41672
79.4%
br 10358
 
19.7%
277
 
0.5%
cl 80
 
0.2%
o 32
 
0.1%
scn 13
 
< 0.1%
pf6 12
 
< 0.1%
f 12
 
< 0.1%
bf4 9
 
< 0.1%
s 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
I 41672
50.6%
B 10367
 
12.6%
r 10358
 
12.6%
9977
 
12.1%
; 9423
 
11.4%
| 277
 
0.3%
C 93
 
0.1%
l 80
 
0.1%
F 33
 
< 0.1%
) 32
 
< 0.1%
Other values (7) 125
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 52237
63.4%
Lowercase Letter 10438
 
12.7%
Space Separator 9977
 
12.1%
Other Punctuation 9423
 
11.4%
Math Symbol 277
 
0.3%
Close Punctuation 32
 
< 0.1%
Open Punctuation 32
 
< 0.1%
Decimal Number 21
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I 41672
79.8%
B 10367
 
19.8%
C 93
 
0.2%
F 33
 
0.1%
O 32
 
0.1%
S 15
 
< 0.1%
N 13
 
< 0.1%
P 12
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
r 10358
99.2%
l 80
 
0.8%
Decimal Number
ValueCountFrequency (%)
6 12
57.1%
4 9
42.9%
Space Separator
ValueCountFrequency (%)
9977
100.0%
Other Punctuation
ValueCountFrequency (%)
; 9423
100.0%
Math Symbol
ValueCountFrequency (%)
| 277
100.0%
Close Punctuation
ValueCountFrequency (%)
) 32
100.0%
Open Punctuation
ValueCountFrequency (%)
( 32
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 62675
76.0%
Common 19762
 
24.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 41672
66.5%
B 10367
 
16.5%
r 10358
 
16.5%
C 93
 
0.1%
l 80
 
0.1%
F 33
 
0.1%
O 32
 
0.1%
S 15
 
< 0.1%
N 13
 
< 0.1%
P 12
 
< 0.1%
Common
ValueCountFrequency (%)
9977
50.5%
; 9423
47.7%
| 277
 
1.4%
) 32
 
0.2%
( 32
 
0.2%
6 12
 
0.1%
4 9
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 82437
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 41672
50.6%
B 10367
 
12.6%
r 10358
 
12.6%
9977
 
12.1%
; 9423
 
11.4%
| 277
 
0.3%
C 93
 
0.1%
l 80
 
0.1%
F 33
 
< 0.1%
) 32
 
< 0.1%
Other values (7) 125
 
0.2%

Perovskite_composition_c_ions_coefficients
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct370
Distinct (%)0.9%
Missing41
Missing (%)0.1%
Memory size332.1 KiB
3
31712 
0.45; 2.55
 
2269
0.51; 2.49
 
1858
1; 2
 
737
0.3; 2.7
 
363
Other values (365)
5517 

Length

Max length34
Median length1
Mean length2.7958592
Min length1

Characters and Unicode

Total characters118701
Distinct characters15
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique141 ?
Unique (%)0.3%

Sample

1st row3
2nd row0.3; 2.7
3rd row1.5; 1.5
4th row2.7; 0.3
5th row3

Common Values

ValueCountFrequency (%)
3 31712
74.6%
0.45; 2.55 2269
 
5.3%
0.51; 2.49 1858
 
4.4%
1; 2 737
 
1.7%
0.3; 2.7 363
 
0.9%
2; 1 327
 
0.8%
13 317
 
0.7%
0.5; 2.5 285
 
0.7%
0.15; 2.85 223
 
0.5%
9 218
 
0.5%
Other values (360) 4147
 
9.8%

Length

2023-05-05T12:08:02.243966image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
3 31942
61.0%
0.45 2369
 
4.5%
2.55 2338
 
4.5%
0.51 1945
 
3.7%
2.49 1915
 
3.7%
1 1226
 
2.3%
2 1131
 
2.2%
0.3 434
 
0.8%
2.7 411
 
0.8%
13 317
 
0.6%
Other values (331) 8342
 
15.9%

Most occurring characters

ValueCountFrequency (%)
3 33736
28.4%
. 16329
13.8%
5 12066
 
10.2%
9915
 
8.4%
2 9857
 
8.3%
; 9361
 
7.9%
0 8808
 
7.4%
4 5756
 
4.8%
1 5645
 
4.8%
9 3225
 
2.7%
Other values (5) 4003
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 82743
69.7%
Other Punctuation 25690
 
21.6%
Space Separator 9915
 
8.4%
Math Symbol 277
 
0.2%
Lowercase Letter 76
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 33736
40.8%
5 12066
 
14.6%
2 9857
 
11.9%
0 8808
 
10.6%
4 5756
 
7.0%
1 5645
 
6.8%
9 3225
 
3.9%
6 1280
 
1.5%
7 1279
 
1.5%
8 1091
 
1.3%
Other Punctuation
ValueCountFrequency (%)
. 16329
63.6%
; 9361
36.4%
Space Separator
ValueCountFrequency (%)
9915
100.0%
Math Symbol
ValueCountFrequency (%)
| 277
100.0%
Lowercase Letter
ValueCountFrequency (%)
x 76
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 118625
99.9%
Latin 76
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 33736
28.4%
. 16329
13.8%
5 12066
 
10.2%
9915
 
8.4%
2 9857
 
8.3%
; 9361
 
7.9%
0 8808
 
7.4%
4 5756
 
4.9%
1 5645
 
4.8%
9 3225
 
2.7%
Other values (4) 3927
 
3.3%
Latin
ValueCountFrequency (%)
x 76
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 118701
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 33736
28.4%
. 16329
13.8%
5 12066
 
10.2%
9915
 
8.4%
2 9857
 
8.3%
; 9361
 
7.9%
0 8808
 
7.4%
4 5756
 
4.8%
1 5645
 
4.8%
9 3225
 
2.7%
Other values (5) 4003
 
3.4%

Perovskite_composition_none_stoichiometry_components_in_excess
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct45
Distinct (%)0.6%
Missing35521
Missing (%)83.6%
Memory size332.1 KiB
PbI2
3212 
Stoichiometric
1390 
MAI
909 
MA
663 
PbI2; PbBr2
 
212
Other values (40)
590 

Length

Max length31
Median length20
Mean length5.9568521
Min length1

Characters and Unicode

Total characters41555
Distinct characters36
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.1%

Sample

1st rowMA
2nd rowHCl
3rd rowHCl
4th rowPbI2
5th rowPbI2

Common Values

ValueCountFrequency (%)
PbI2 3212
 
7.6%
Stoichiometric 1390
 
3.3%
MAI 909
 
2.1%
MA 663
 
1.6%
PbI2; PbBr2 212
 
0.5%
Pb 87
 
0.2%
PbCl2 78
 
0.2%
MACl 65
 
0.2%
PbBr2 43
 
0.1%
CsBr 38
 
0.1%
Other values (35) 279
 
0.7%
(Missing) 35521
83.6%

Length

2023-05-05T12:08:02.399191image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pbi2 3460
47.4%
stoichiometric 1405
19.3%
mai 922
 
12.6%
ma 663
 
9.1%
pbbr2 273
 
3.7%
pb 87
 
1.2%
pbcl2 78
 
1.1%
macl 66
 
0.9%
fai 48
 
0.7%
sni2 38
 
0.5%
Other values (27) 252
 
3.5%

Most occurring characters

ValueCountFrequency (%)
I 4548
10.9%
i 4224
10.2%
P 3917
9.4%
b 3912
9.4%
2 3885
9.3%
c 2816
 
6.8%
t 2810
 
6.8%
o 2810
 
6.8%
r 1762
 
4.2%
A 1756
 
4.2%
Other values (26) 9115
21.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22919
55.2%
Uppercase Letter 14106
33.9%
Decimal Number 3916
 
9.4%
Space Separator 316
 
0.8%
Other Punctuation 278
 
0.7%
Math Symbol 19
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I 4548
32.2%
P 3917
27.8%
A 1756
 
12.4%
M 1677
 
11.9%
S 1483
 
10.5%
B 359
 
2.5%
C 241
 
1.7%
F 70
 
0.5%
H 19
 
0.1%
N 17
 
0.1%
Other values (4) 19
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
i 4224
18.4%
b 3912
17.1%
c 2816
12.3%
t 2810
12.3%
o 2810
12.3%
r 1762
7.7%
h 1405
 
6.1%
e 1405
 
6.1%
m 1405
 
6.1%
l 165
 
0.7%
Other values (4) 205
 
0.9%
Decimal Number
ValueCountFrequency (%)
2 3885
99.2%
4 15
 
0.4%
3 15
 
0.4%
5 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
316
100.0%
Other Punctuation
ValueCountFrequency (%)
; 278
100.0%
Math Symbol
ValueCountFrequency (%)
| 19
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 37025
89.1%
Common 4530
 
10.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 4548
12.3%
i 4224
11.4%
P 3917
10.6%
b 3912
10.6%
c 2816
7.6%
t 2810
7.6%
o 2810
7.6%
r 1762
 
4.8%
A 1756
 
4.7%
M 1677
 
4.5%
Other values (18) 6793
18.3%
Common
ValueCountFrequency (%)
2 3885
85.8%
316
 
7.0%
; 278
 
6.1%
| 19
 
0.4%
4 15
 
0.3%
3 15
 
0.3%
5 1
 
< 0.1%
- 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41555
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 4548
10.9%
i 4224
10.2%
P 3917
9.4%
b 3912
9.4%
2 3885
9.3%
c 2816
 
6.8%
t 2810
 
6.8%
o 2810
 
6.8%
r 1762
 
4.2%
A 1756
 
4.2%
Other values (26) 9115
21.9%

Perovskite_additives_compounds
Categorical

HIGH CARDINALITY  MISSING 

Distinct979
Distinct (%)7.1%
Missing28675
Missing (%)67.5%
Memory size332.1 KiB
Cl
5182 
Undoped
1313 
Unknown
 
558
5-AVAI
 
349
SnF2
 
240
Other values (974)
6180 

Length

Max length142
Median length90
Mean length5.6151064
Min length1

Characters and Unicode

Total characters77612
Distinct characters84
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique284 ?
Unique (%)2.1%

Sample

1st rowSnF2
2nd rowSnF2
3rd rowSnF2
4th rowSnF2
5th rowCl

Common Values

ValueCountFrequency (%)
Cl 5182
 
12.2%
Undoped 1313
 
3.1%
Unknown 558
 
1.3%
5-AVAI 349
 
0.8%
SnF2 240
 
0.6%
HI 210
 
0.5%
Rb 175
 
0.4%
Pb(SCN)2 174
 
0.4%
Acetate 119
 
0.3%
KI 105
 
0.2%
Other values (969) 5397
 
12.7%
(Missing) 28675
67.5%

Length

2023-05-05T12:08:02.586934image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cl 5946
37.4%
undoped 1341
 
8.4%
unknown 558
 
3.5%
snf2 428
 
2.7%
5-avai 367
 
2.3%
hi 304
 
1.9%
acetate 285
 
1.8%
pb(scn)2 230
 
1.4%
rb 205
 
1.3%
pbcl2 133
 
0.8%
Other values (866) 6114
38.4%

Most occurring characters

ValueCountFrequency (%)
C 7991
 
10.3%
l 7773
 
10.0%
n 5026
 
6.5%
d 3690
 
4.8%
o 3622
 
4.7%
e 3612
 
4.7%
A 2483
 
3.2%
i 2262
 
2.9%
2088
 
2.7%
p 2080
 
2.7%
Other values (74) 36985
47.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 40836
52.6%
Uppercase Letter 26259
33.8%
Decimal Number 3815
 
4.9%
Space Separator 2088
 
2.7%
Dash Punctuation 1782
 
2.3%
Other Punctuation 1652
 
2.1%
Close Punctuation 550
 
0.7%
Open Punctuation 550
 
0.7%
Math Symbol 61
 
0.1%
Format 18
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 7991
30.4%
A 2483
 
9.5%
U 1963
 
7.5%
P 1920
 
7.3%
I 1826
 
7.0%
H 1411
 
5.4%
S 1236
 
4.7%
N 1102
 
4.2%
F 801
 
3.1%
M 740
 
2.8%
Other values (16) 4786
18.2%
Lowercase Letter
ValueCountFrequency (%)
l 7773
19.0%
n 5026
12.3%
d 3690
9.0%
o 3622
8.9%
e 3612
8.8%
i 2262
 
5.5%
p 2080
 
5.1%
a 1859
 
4.6%
t 1376
 
3.4%
r 1372
 
3.4%
Other values (16) 8164
20.0%
Decimal Number
ValueCountFrequency (%)
2 1771
46.4%
3 472
 
12.4%
4 431
 
11.3%
5 423
 
11.1%
1 260
 
6.8%
0 179
 
4.7%
6 168
 
4.4%
8 49
 
1.3%
7 41
 
1.1%
9 17
 
0.4%
Other Punctuation
ValueCountFrequency (%)
; 1420
86.0%
, 142
 
8.6%
: 39
 
2.4%
@ 29
 
1.8%
. 13
 
0.8%
/ 8
 
0.5%
1
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 1549
86.9%
221
 
12.4%
7
 
0.4%
5
 
0.3%
Math Symbol
ValueCountFrequency (%)
| 55
90.2%
4
 
6.6%
+ 2
 
3.3%
Close Punctuation
ValueCountFrequency (%)
) 523
95.1%
] 27
 
4.9%
Open Punctuation
ValueCountFrequency (%)
( 523
95.1%
[ 27
 
4.9%
Space Separator
ValueCountFrequency (%)
2088
100.0%
Format
ValueCountFrequency (%)
­ 18
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 67084
86.4%
Common 10513
 
13.5%
Greek 11
 
< 0.1%
Arabic 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 7991
 
11.9%
l 7773
 
11.6%
n 5026
 
7.5%
d 3690
 
5.5%
o 3622
 
5.4%
e 3612
 
5.4%
A 2483
 
3.7%
i 2262
 
3.4%
p 2080
 
3.1%
U 1963
 
2.9%
Other values (41) 26582
39.6%
Common
ValueCountFrequency (%)
2088
19.9%
2 1771
16.8%
- 1549
14.7%
; 1420
13.5%
) 523
 
5.0%
( 523
 
5.0%
3 472
 
4.5%
4 431
 
4.1%
5 423
 
4.0%
1 260
 
2.5%
Other values (21) 1053
10.0%
Greek
ValueCountFrequency (%)
α 11
100.0%
Arabic
ValueCountFrequency (%)
۰ 4
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 77341
99.7%
Punctuation 234
 
0.3%
None 29
 
< 0.1%
Arabic 4
 
< 0.1%
Math Operators 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 7991
 
10.3%
l 7773
 
10.1%
n 5026
 
6.5%
d 3690
 
4.8%
o 3622
 
4.7%
e 3612
 
4.7%
A 2483
 
3.2%
i 2262
 
2.9%
2088
 
2.7%
p 2080
 
2.7%
Other values (66) 36714
47.5%
Punctuation
ValueCountFrequency (%)
221
94.4%
7
 
3.0%
5
 
2.1%
1
 
0.4%
None
ValueCountFrequency (%)
­ 18
62.1%
α 11
37.9%
Arabic
ValueCountFrequency (%)
۰ 4
100.0%
Math Operators
ValueCountFrequency (%)
4
100.0%

Perovskite_additives_concentrations
Categorical

HIGH CARDINALITY  MISSING 

Distinct706
Distinct (%)14.4%
Missing37606
Missing (%)88.5%
Memory size332.1 KiB
0.05
395 
0.1
371 
0.01
 
238
0.03
 
171
0.02
 
158
Other values (701)
3558 

Length

Max length31
Median length28
Mean length5.73257
Min length1

Characters and Unicode

Total characters28038
Distinct characters45
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique300 ?
Unique (%)6.1%

Sample

1st row0.2
2nd row0.2
3rd row0.2
4th row0.2
5th row0.25

Common Values

ValueCountFrequency (%)
0.05 395
 
0.9%
0.1 371
 
0.9%
0.01 238
 
0.6%
0.03 171
 
0.4%
0.02 158
 
0.4%
0.66 110
 
0.3%
0.15 103
 
0.2%
0.2 94
 
0.2%
0.25 79
 
0.2%
0.5 76
 
0.2%
Other values (696) 3096
 
7.3%
(Missing) 37606
88.5%

Length

2023-05-05T12:08:02.771574image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.05 499
 
6.9%
0.1 487
 
6.7%
mol 455
 
6.3%
nan 367
 
5.1%
0.01 321
 
4.4%
mg/ml 294
 
4.0%
wt 293
 
4.0%
278
 
3.8%
0.03 195
 
2.7%
1 188
 
2.6%
Other values (278) 3887
53.5%

Most occurring characters

ValueCountFrequency (%)
0 6879
24.5%
. 4146
14.8%
2388
 
8.5%
1 1927
 
6.9%
5 1755
 
6.3%
m 1161
 
4.1%
% 1139
 
4.1%
2 1009
 
3.6%
l 946
 
3.4%
3 841
 
3.0%
Other values (35) 5847
20.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 14084
50.2%
Other Punctuation 6305
22.5%
Lowercase Letter 5003
 
17.8%
Space Separator 2388
 
8.5%
Uppercase Letter 200
 
0.7%
Math Symbol 35
 
0.1%
Dash Punctuation 17
 
0.1%
Connector Punctuation 5
 
< 0.1%
Format 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m 1161
23.2%
l 946
18.9%
n 742
14.8%
o 625
12.5%
a 371
 
7.4%
g 313
 
6.3%
t 309
 
6.2%
w 299
 
6.0%
v 146
 
2.9%
e 21
 
0.4%
Other values (7) 70
 
1.4%
Decimal Number
ValueCountFrequency (%)
0 6879
48.8%
1 1927
 
13.7%
5 1755
 
12.5%
2 1009
 
7.2%
3 841
 
6.0%
6 667
 
4.7%
7 313
 
2.2%
4 293
 
2.1%
8 216
 
1.5%
9 184
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
M 108
54.0%
A 16
 
8.0%
G 15
 
7.5%
E 15
 
7.5%
P 15
 
7.5%
L 14
 
7.0%
V 8
 
4.0%
I 8
 
4.0%
W 1
 
0.5%
Other Punctuation
ValueCountFrequency (%)
. 4146
65.8%
% 1139
 
18.1%
; 691
 
11.0%
/ 329
 
5.2%
Space Separator
ValueCountFrequency (%)
2388
100.0%
Math Symbol
ValueCountFrequency (%)
| 35
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 5
100.0%
Format
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 22846
81.5%
Latin 5192
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
m 1161
22.4%
l 946
18.2%
n 742
14.3%
o 625
12.0%
a 371
 
7.1%
g 313
 
6.0%
t 309
 
6.0%
w 299
 
5.8%
v 146
 
2.8%
M 108
 
2.1%
Other values (15) 172
 
3.3%
Common
ValueCountFrequency (%)
0 6879
30.1%
. 4146
18.1%
2388
 
10.5%
1 1927
 
8.4%
5 1755
 
7.7%
% 1139
 
5.0%
2 1009
 
4.4%
3 841
 
3.7%
; 691
 
3.0%
6 667
 
2.9%
Other values (10) 1404
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28026
> 99.9%
None 11
 
< 0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 6879
24.5%
. 4146
14.8%
2388
 
8.5%
1 1927
 
6.9%
5 1755
 
6.3%
m 1161
 
4.1%
% 1139
 
4.1%
2 1009
 
3.6%
l 946
 
3.4%
3 841
 
3.0%
Other values (33) 5835
20.8%
None
ValueCountFrequency (%)
µ 11
100.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

Perovskite_thickness
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing29124
Missing (%)68.5%
Memory size332.1 KiB

Perovskite_band_gap
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10585
Missing (%)24.9%
Memory size332.1 KiB

Perovskite_band_gap_graded
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
42437 
True
 
60
ValueCountFrequency (%)
False 42437
99.9%
True 60
 
0.1%
2023-05-05T12:08:02.941362image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Perovskite_pl_max
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing32254
Missing (%)75.9%
Memory size332.1 KiB
Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2792197
Minimum0
Maximum12
Zeros94
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:08:03.039302image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile2
Maximum12
Range12
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.51945352
Coefficient of variation (CV)0.4060706
Kurtosis41.145649
Mean1.2792197
Median Absolute Deviation (MAD)0
Skewness3.4095069
Sum54363
Variance0.26983196
MonotonicityNot monotonic
2023-05-05T12:08:03.196410image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 31191
73.4%
2 10660
 
25.1%
3 475
 
1.1%
0 94
 
0.2%
4 53
 
0.1%
10 12
 
< 0.1%
6 5
 
< 0.1%
12 4
 
< 0.1%
5 2
 
< 0.1%
7 1
 
< 0.1%
ValueCountFrequency (%)
0 94
 
0.2%
1 31191
73.4%
2 10660
 
25.1%
3 475
 
1.1%
4 53
 
0.1%
5 2
 
< 0.1%
6 5
 
< 0.1%
7 1
 
< 0.1%
10 12
 
< 0.1%
12 4
 
< 0.1%
ValueCountFrequency (%)
12 4
 
< 0.1%
10 12
 
< 0.1%
7 1
 
< 0.1%
6 5
 
< 0.1%
5 2
 
< 0.1%
4 53
 
0.1%
3 475
 
1.1%
2 10660
 
25.1%
1 31191
73.4%
0 94
 
0.2%

Perovskite_deposition_procedure
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct198
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Spin-coating
29100 
Spin-coating >> Spin-coating
5471 
Spin-coating >> CBD
3178 
Drop-infiltration
 
632
Spin-coating >> Gas reaction
 
551
Other values (193)
3565 

Length

Max length156
Median length12
Mean length15.895146
Min length3

Characters and Unicode

Total characters675496
Distinct characters47
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique29 ?
Unique (%)0.1%

Sample

1st rowSpin-coating
2nd rowSpin-coating
3rd rowSpin-coating
4th rowSpin-coating
5th rowSpin-coating

Common Values

ValueCountFrequency (%)
Spin-coating 29100
68.5%
Spin-coating >> Spin-coating 5471
 
12.9%
Spin-coating >> CBD 3178
 
7.5%
Drop-infiltration 632
 
1.5%
Spin-coating >> Gas reaction 551
 
1.3%
Co-evaporation 398
 
0.9%
Doctor blading 266
 
0.6%
Unknown 147
 
0.3%
Slot-die coating 143
 
0.3%
Evaporation >> Gas reaction 134
 
0.3%
Other values (188) 2477
 
5.8%

Length

2023-05-05T12:08:03.417814image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
spin-coating 45545
66.5%
11898
 
17.4%
cbd 3657
 
5.3%
evaporation 944
 
1.4%
reaction 928
 
1.4%
gas 872
 
1.3%
drop-infiltration 772
 
1.1%
co-evaporation 434
 
0.6%
doctor 291
 
0.4%
blading 291
 
0.4%
Other values (78) 2807
 
4.1%

Most occurring characters

ValueCountFrequency (%)
i 98506
14.6%
n 97670
14.5%
o 53758
8.0%
a 53223
7.9%
t 51610
7.6%
p 49009
7.3%
c 48107
7.1%
- 47471
7.0%
g 47087
7.0%
S 46039
6.8%
Other values (37) 83016
12.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 516571
76.5%
Uppercase Letter 61835
 
9.2%
Dash Punctuation 47471
 
7.0%
Space Separator 25942
 
3.8%
Math Symbol 23677
 
3.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 98506
19.1%
n 97670
18.9%
o 53758
10.4%
a 53223
10.3%
t 51610
10.0%
p 49009
9.5%
c 48107
9.3%
g 47087
9.1%
r 5573
 
1.1%
e 2401
 
0.5%
Other values (15) 9627
 
1.9%
Uppercase Letter
ValueCountFrequency (%)
S 46039
74.5%
D 5212
 
8.4%
C 4201
 
6.8%
B 3671
 
5.9%
E 1007
 
1.6%
G 894
 
1.4%
U 299
 
0.5%
R 151
 
0.2%
I 137
 
0.2%
A 69
 
0.1%
Other values (8) 155
 
0.3%
Math Symbol
ValueCountFrequency (%)
> 23558
99.5%
| 119
 
0.5%
Dash Punctuation
ValueCountFrequency (%)
- 47471
100.0%
Space Separator
ValueCountFrequency (%)
25942
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 578406
85.6%
Common 97090
 
14.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 98506
17.0%
n 97670
16.9%
o 53758
9.3%
a 53223
9.2%
t 51610
8.9%
p 49009
8.5%
c 48107
8.3%
g 47087
8.1%
S 46039
8.0%
r 5573
 
1.0%
Other values (33) 27824
 
4.8%
Common
ValueCountFrequency (%)
- 47471
48.9%
25942
26.7%
> 23558
24.3%
| 119
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 675496
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 98506
14.6%
n 97670
14.5%
o 53758
8.0%
a 53223
7.9%
t 51610
7.6%
p 49009
7.3%
c 48107
7.1%
- 47471
7.0%
g 47087
7.0%
S 46039
6.8%
Other values (37) 83016
12.3%

Perovskite_deposition_synthesis_atmosphere
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct186
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
14843 
N2
14652 
Air
3448 
N2 >> N2
2805 
Air >> Air
1962 
Other values (181)
4787 

Length

Max length56
Median length52
Mean length5.630068
Min length2

Characters and Unicode

Total characters239261
Distinct characters52
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37 ?
Unique (%)0.1%

Sample

1st rowN2
2nd rowN2
3rd rowN2
4th rowN2
5th rowN2

Common Values

ValueCountFrequency (%)
Unknown 14843
34.9%
N2 14652
34.5%
Air 3448
 
8.1%
N2 >> N2 2805
 
6.6%
Air >> Air 1962
 
4.6%
Ar 692
 
1.6%
Dry air 651
 
1.5%
Vacuum 465
 
1.1%
Ar >> Ar 212
 
0.5%
Dry air >> Dry air 204
 
0.5%
Other values (176) 2563
 
6.0%

Length

2023-05-05T12:08:03.632547image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
n2 21666
35.7%
unknown 15265
25.2%
air 9996
16.5%
8067
 
13.3%
vacuum 1679
 
2.8%
ar 1203
 
2.0%
dry 1131
 
1.9%
mai 582
 
1.0%
ambient 428
 
0.7%
methylamin 146
 
0.2%
Other values (40) 462
 
0.8%

Most occurring characters

ValueCountFrequency (%)
n 46453
19.4%
2 21723
9.1%
N 21676
9.1%
18128
 
7.6%
> 15930
 
6.7%
o 15341
 
6.4%
U 15265
 
6.4%
k 15265
 
6.4%
w 15265
 
6.4%
r 12471
 
5.2%
Other values (42) 41744
17.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 129273
54.0%
Uppercase Letter 53220
22.2%
Decimal Number 21747
 
9.1%
Space Separator 18128
 
7.6%
Math Symbol 16029
 
6.7%
Other Punctuation 861
 
0.4%
Dash Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 46453
35.9%
o 15341
 
11.9%
k 15265
 
11.8%
w 15265
 
11.8%
r 12471
 
9.6%
i 10600
 
8.2%
u 3362
 
2.6%
a 2969
 
2.3%
m 2261
 
1.7%
c 1683
 
1.3%
Other values (11) 3603
 
2.8%
Uppercase Letter
ValueCountFrequency (%)
N 21676
40.7%
U 15265
28.7%
A 11352
21.3%
V 1679
 
3.2%
D 1149
 
2.2%
M 882
 
1.7%
I 751
 
1.4%
F 122
 
0.2%
B 103
 
0.2%
C 101
 
0.2%
Other values (8) 140
 
0.3%
Decimal Number
ValueCountFrequency (%)
2 21723
99.9%
4 12
 
0.1%
5 6
 
< 0.1%
1 2
 
< 0.1%
7 2
 
< 0.1%
0 2
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
; 856
99.4%
@ 3
 
0.3%
, 2
 
0.2%
Math Symbol
ValueCountFrequency (%)
> 15930
99.4%
| 99
 
0.6%
Space Separator
ValueCountFrequency (%)
18128
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 182493
76.3%
Common 56768
 
23.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 46453
25.5%
N 21676
11.9%
o 15341
 
8.4%
U 15265
 
8.4%
k 15265
 
8.4%
w 15265
 
8.4%
r 12471
 
6.8%
A 11352
 
6.2%
i 10600
 
5.8%
u 3362
 
1.8%
Other values (29) 15443
 
8.5%
Common
ValueCountFrequency (%)
2 21723
38.3%
18128
31.9%
> 15930
28.1%
; 856
 
1.5%
| 99
 
0.2%
4 12
 
< 0.1%
5 6
 
< 0.1%
- 3
 
< 0.1%
@ 3
 
< 0.1%
, 2
 
< 0.1%
Other values (3) 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 239261
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 46453
19.4%
2 21723
9.1%
N 21676
9.1%
18128
 
7.6%
> 15930
 
6.7%
o 15341
 
6.4%
U 15265
 
6.4%
k 15265
 
6.4%
w 15265
 
6.4%
r 12471
 
5.2%
Other values (42) 41744
17.4%

Perovskite_deposition_solvents
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct343
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
DMF; DMSO
13990 
DMF
8699 
DMF >> IPA
5574 
DMSO; GBL
3019 
DMF; DMSO >> IPA
1903 
Other values (338)
9312 

Length

Max length213
Median length206
Mean length8.5521802
Min length3

Characters and Unicode

Total characters363442
Distinct characters58
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66 ?
Unique (%)0.2%

Sample

1st rowDMSO
2nd rowDMSO
3rd rowDMSO
4th rowDMSO
5th rowDMSO

Common Values

ValueCountFrequency (%)
DMF; DMSO 13990
32.9%
DMF 8699
20.5%
DMF >> IPA 5574
 
13.1%
DMSO; GBL 3019
 
7.1%
DMF; DMSO >> IPA 1903
 
4.5%
DMSO 1747
 
4.1%
Unknown 1075
 
2.5%
GBL 974
 
2.3%
DMF >> none 621
 
1.5%
none 502
 
1.2%
Other values (333) 4393
 
10.3%

Length

2023-05-05T12:08:03.845441image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dmf 33577
38.2%
dmso 21850
24.9%
11767
 
13.4%
ipa 9078
 
10.3%
gbl 4537
 
5.2%
none 2507
 
2.9%
unknown 1186
 
1.4%
nmp 415
 
0.5%
methanol 397
 
0.5%
ethanol 362
 
0.4%
Other values (120) 2166
 
2.5%

Most occurring characters

ValueCountFrequency (%)
M 56567
15.6%
D 55576
15.3%
45345
12.5%
F 33614
9.2%
> 23276
 
6.4%
O 22185
 
6.1%
S 21850
 
6.0%
; 21481
 
5.9%
n 10538
 
2.9%
P 9621
 
2.6%
Other values (48) 63389
17.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 234213
64.4%
Space Separator 45345
 
12.5%
Lowercase Letter 38270
 
10.5%
Math Symbol 23381
 
6.4%
Other Punctuation 21516
 
5.9%
Decimal Number 357
 
0.1%
Dash Punctuation 310
 
0.1%
Open Punctuation 25
 
< 0.1%
Close Punctuation 25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 10538
27.5%
e 5734
15.0%
o 5622
14.7%
t 2700
 
7.1%
a 2301
 
6.0%
l 2094
 
5.5%
h 1658
 
4.3%
k 1186
 
3.1%
w 1186
 
3.1%
i 1075
 
2.8%
Other values (13) 4176
 
10.9%
Uppercase Letter
ValueCountFrequency (%)
M 56567
24.2%
D 55576
23.7%
F 33614
14.4%
O 22185
 
9.5%
S 21850
 
9.3%
P 9621
 
4.1%
A 9160
 
3.9%
I 9106
 
3.9%
B 4614
 
2.0%
G 4550
 
1.9%
Other values (9) 7370
 
3.1%
Decimal Number
ValueCountFrequency (%)
2 287
80.4%
1 27
 
7.6%
7 19
 
5.3%
3 12
 
3.4%
4 7
 
2.0%
9 5
 
1.4%
Other Punctuation
ValueCountFrequency (%)
; 21481
99.8%
@ 24
 
0.1%
, 11
 
0.1%
Math Symbol
ValueCountFrequency (%)
> 23276
99.6%
| 105
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 309
99.7%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
45345
100.0%
Open Punctuation
ValueCountFrequency (%)
( 25
100.0%
Close Punctuation
ValueCountFrequency (%)
) 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 272483
75.0%
Common 90959
 
25.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 56567
20.8%
D 55576
20.4%
F 33614
12.3%
O 22185
 
8.1%
S 21850
 
8.0%
n 10538
 
3.9%
P 9621
 
3.5%
A 9160
 
3.4%
I 9106
 
3.3%
e 5734
 
2.1%
Other values (32) 38532
14.1%
Common
ValueCountFrequency (%)
45345
49.9%
> 23276
25.6%
; 21481
23.6%
- 309
 
0.3%
2 287
 
0.3%
| 105
 
0.1%
1 27
 
< 0.1%
( 25
 
< 0.1%
) 25
 
< 0.1%
@ 24
 
< 0.1%
Other values (6) 55
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 363441
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 56567
15.6%
D 55576
15.3%
45345
12.5%
F 33614
9.2%
> 23276
 
6.4%
O 22185
 
6.1%
S 21850
 
6.0%
; 21481
 
5.9%
n 10538
 
2.9%
P 9621
 
2.6%
Other values (47) 63388
17.4%
Punctuation
ValueCountFrequency (%)
1
100.0%

Perovskite_deposition_solvents_mixing_ratios
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct528
Distinct (%)1.3%
Missing2125
Missing (%)5.0%
Memory size332.1 KiB
1
12152 
1 >> 1
7617 
4; 1
7593 
3; 7
2440 
9; 1
2153 
Other values (523)
8417 

Length

Max length31
Median length26
Mean length4.0800307
Min length1

Characters and Unicode

Total characters164719
Distinct characters17
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique172 ?
Unique (%)0.4%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 12152
28.6%
1 >> 1 7617
17.9%
4; 1 7593
17.9%
3; 7 2440
 
5.7%
9; 1 2153
 
5.1%
7; 3 1040
 
2.4%
9; 1 >> 1 584
 
1.4%
1; 1 481
 
1.1%
4; 1 >> 1 427
 
1.0%
1 >> 1 >> 1 266
 
0.6%
Other values (518) 5619
13.2%
(Missing) 2125
 
5.0%

Length

2023-05-05T12:08:04.008781image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 46289
55.3%
11491
 
13.7%
4 8749
 
10.5%
3 4418
 
5.3%
7 3971
 
4.7%
9 3158
 
3.8%
2 680
 
0.8%
nan 499
 
0.6%
6 433
 
0.5%
8 428
 
0.5%
Other values (224) 3578
 
4.3%

Most occurring characters

ValueCountFrequency (%)
1 47859
29.1%
43322
26.3%
> 22780
13.8%
; 20340
12.3%
4 8949
 
5.4%
3 4837
 
2.9%
7 4613
 
2.8%
9 3799
 
2.3%
0 1622
 
1.0%
5 1289
 
0.8%
Other values (7) 5309
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 75836
46.0%
Space Separator 43322
26.3%
Math Symbol 22881
 
13.9%
Other Punctuation 21183
 
12.9%
Lowercase Letter 1497
 
0.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 47859
63.1%
4 8949
 
11.8%
3 4837
 
6.4%
7 4613
 
6.1%
9 3799
 
5.0%
0 1622
 
2.1%
5 1289
 
1.7%
2 1220
 
1.6%
8 873
 
1.2%
6 775
 
1.0%
Math Symbol
ValueCountFrequency (%)
> 22780
99.6%
| 101
 
0.4%
Other Punctuation
ValueCountFrequency (%)
; 20340
96.0%
. 843
 
4.0%
Lowercase Letter
ValueCountFrequency (%)
n 998
66.7%
a 499
33.3%
Space Separator
ValueCountFrequency (%)
43322
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 163222
99.1%
Latin 1497
 
0.9%

Most frequent character per script

Common
ValueCountFrequency (%)
1 47859
29.3%
43322
26.5%
> 22780
14.0%
; 20340
12.5%
4 8949
 
5.5%
3 4837
 
3.0%
7 4613
 
2.8%
9 3799
 
2.3%
0 1622
 
1.0%
5 1289
 
0.8%
Other values (5) 3812
 
2.3%
Latin
ValueCountFrequency (%)
n 998
66.7%
a 499
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 164719
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 47859
29.1%
43322
26.3%
> 22780
13.8%
; 20340
12.3%
4 8949
 
5.4%
3 4837
 
2.9%
7 4613
 
2.8%
9 3799
 
2.3%
0 1622
 
1.0%
5 1289
 
0.8%
Other values (7) 5309
 
3.2%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
22278 
True
20219 
ValueCountFrequency (%)
False 22278
52.4%
True 20219
47.6%
2023-05-05T12:08:04.171435image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Perovskite_deposition_quenching_media
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct94
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
22375 
Chlorobenzene
10363 
Toluene
3814 
Diethyl ether
2788 
Ethyl acetate
 
769
Other values (89)
2388 

Length

Max length37
Median length7
Mean length9.0337906
Min length2

Characters and Unicode

Total characters383909
Distinct characters48
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)< 0.1%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 22375
52.7%
Chlorobenzene 10363
24.4%
Toluene 3814
 
9.0%
Diethyl ether 2788
 
6.6%
Ethyl acetate 769
 
1.8%
N2 347
 
0.8%
Vacuum 279
 
0.7%
Anisole 248
 
0.6%
2-Butanol 150
 
0.4%
Ar 138
 
0.3%
Other values (84) 1226
 
2.9%

Length

2023-05-05T12:08:04.331671image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 22375
47.9%
chlorobenzene 10487
22.5%
toluene 3860
 
8.3%
ether 3061
 
6.6%
diethyl 2806
 
6.0%
ethyl 947
 
2.0%
acetate 843
 
1.8%
n2 413
 
0.9%
anisole 299
 
0.6%
vacuum 279
 
0.6%
Other values (69) 1324
 
2.8%

Most occurring characters

ValueCountFrequency (%)
n 92871
24.2%
e 50707
13.2%
o 48675
12.7%
U 22375
 
5.8%
k 22375
 
5.8%
w 22375
 
5.8%
l 19191
 
5.0%
h 17593
 
4.6%
r 14378
 
3.7%
C 10571
 
2.8%
Other values (38) 62798
16.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 335655
87.4%
Uppercase Letter 43018
 
11.2%
Space Separator 4197
 
1.1%
Decimal Number 583
 
0.2%
Other Punctuation 198
 
0.1%
Dash Punctuation 184
 
< 0.1%
Math Symbol 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 92871
27.7%
e 50707
15.1%
o 48675
14.5%
k 22375
 
6.7%
w 22375
 
6.7%
l 19191
 
5.7%
h 17593
 
5.2%
r 14378
 
4.3%
b 10524
 
3.1%
z 10507
 
3.1%
Other values (15) 26459
 
7.9%
Uppercase Letter
ValueCountFrequency (%)
U 22375
52.0%
C 10571
24.6%
T 3994
 
9.3%
D 2899
 
6.7%
E 1042
 
2.4%
A 704
 
1.6%
N 423
 
1.0%
V 279
 
0.6%
B 170
 
0.4%
I 143
 
0.3%
Other values (8) 418
 
1.0%
Space Separator
ValueCountFrequency (%)
4197
100.0%
Decimal Number
ValueCountFrequency (%)
2 583
100.0%
Other Punctuation
ValueCountFrequency (%)
; 198
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 184
100.0%
Math Symbol
ValueCountFrequency (%)
> 74
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 378673
98.6%
Common 5236
 
1.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 92871
24.5%
e 50707
13.4%
o 48675
12.9%
U 22375
 
5.9%
k 22375
 
5.9%
w 22375
 
5.9%
l 19191
 
5.1%
h 17593
 
4.6%
r 14378
 
3.8%
C 10571
 
2.8%
Other values (33) 57562
15.2%
Common
ValueCountFrequency (%)
4197
80.2%
2 583
 
11.1%
; 198
 
3.8%
- 184
 
3.5%
> 74
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 383909
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 92871
24.2%
e 50707
13.2%
o 48675
12.7%
U 22375
 
5.8%
k 22375
 
5.8%
w 22375
 
5.8%
l 19191
 
5.0%
h 17593
 
4.6%
r 14378
 
3.7%
C 10571
 
2.8%
Other values (38) 62798
16.4%

Perovskite_deposition_quenching_media_mixing_ratios
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing41781
Missing (%)98.3%
Memory size332.1 KiB

Perovskite_deposition_quenching_media_additives_compounds
Categorical

HIGH CARDINALITY  HIGH CORRELATION  MISSING 

Distinct84
Distinct (%)9.6%
Missing41618
Missing (%)97.9%
Memory size332.1 KiB
Undoped
527 
PCBM-60
 
40
MAI
 
21
IEICO-4F
 
14
CsPbBr3-QDs
 
13
Other values (79)
264 

Length

Max length49
Median length7
Mean length6.9283276
Min length2

Characters and Unicode

Total characters6090
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)2.8%

Sample

1st rowBAI
2nd rowBAI
3rd rowBAI
4th rowBAI
5th rowC60; PEG

Common Values

ValueCountFrequency (%)
Undoped 527
 
1.2%
PCBM-60 40
 
0.1%
MAI 21
 
< 0.1%
IEICO-4F 14
 
< 0.1%
CsPbBr3-QDs 13
 
< 0.1%
PTAA 12
 
< 0.1%
Spiro-MeOTAD 10
 
< 0.1%
BHT 10
 
< 0.1%
Polyurethane 9
 
< 0.1%
SWCNTs 8
 
< 0.1%
Other values (74) 215
 
0.5%
(Missing) 41618
97.9%

Length

2023-05-05T12:08:04.488705image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
undoped 527
58.2%
pcbm-60 46
 
5.1%
mai 22
 
2.4%
ieico-4f 14
 
1.5%
cspbbr3-qds 13
 
1.4%
ptaa 12
 
1.3%
spiro-meotad 10
 
1.1%
bht 10
 
1.1%
itic 9
 
1.0%
polyurethane 9
 
1.0%
Other values (73) 234
25.8%

Most occurring characters

ValueCountFrequency (%)
d 1067
17.5%
e 595
 
9.8%
n 579
 
9.5%
p 560
 
9.2%
o 557
 
9.1%
U 527
 
8.7%
P 195
 
3.2%
- 155
 
2.5%
C 148
 
2.4%
A 145
 
2.4%
Other values (52) 1562
25.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3805
62.5%
Uppercase Letter 1846
30.3%
Decimal Number 211
 
3.5%
Dash Punctuation 159
 
2.6%
Space Separator 27
 
0.4%
Other Punctuation 26
 
0.4%
Close Punctuation 8
 
0.1%
Open Punctuation 8
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 527
28.5%
P 195
 
10.6%
C 148
 
8.0%
A 145
 
7.9%
M 145
 
7.9%
B 139
 
7.5%
I 112
 
6.1%
T 76
 
4.1%
D 74
 
4.0%
S 55
 
3.0%
Other values (12) 230
12.5%
Lowercase Letter
ValueCountFrequency (%)
d 1067
28.0%
e 595
15.6%
n 579
15.2%
p 560
14.7%
o 557
14.6%
r 74
 
1.9%
s 73
 
1.9%
b 56
 
1.5%
l 39
 
1.0%
a 39
 
1.0%
Other values (11) 166
 
4.4%
Decimal Number
ValueCountFrequency (%)
0 61
28.9%
6 59
28.0%
3 39
18.5%
4 25
11.8%
2 12
 
5.7%
1 8
 
3.8%
7 4
 
1.9%
9 3
 
1.4%
Other Punctuation
ValueCountFrequency (%)
; 14
53.8%
@ 5
 
19.2%
, 5
 
19.2%
: 2
 
7.7%
Dash Punctuation
ValueCountFrequency (%)
- 155
97.5%
4
 
2.5%
Close Punctuation
ValueCountFrequency (%)
) 4
50.0%
] 4
50.0%
Open Punctuation
ValueCountFrequency (%)
( 4
50.0%
[ 4
50.0%
Space Separator
ValueCountFrequency (%)
27
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5651
92.8%
Common 439
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
d 1067
18.9%
e 595
10.5%
n 579
10.2%
p 560
9.9%
o 557
9.9%
U 527
9.3%
P 195
 
3.5%
C 148
 
2.6%
A 145
 
2.6%
M 145
 
2.6%
Other values (33) 1133
20.0%
Common
ValueCountFrequency (%)
- 155
35.3%
0 61
 
13.9%
6 59
 
13.4%
3 39
 
8.9%
27
 
6.2%
4 25
 
5.7%
; 14
 
3.2%
2 12
 
2.7%
1 8
 
1.8%
@ 5
 
1.1%
Other values (9) 34
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6086
99.9%
Punctuation 4
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
d 1067
17.5%
e 595
 
9.8%
n 579
 
9.5%
p 560
 
9.2%
o 557
 
9.2%
U 527
 
8.7%
P 195
 
3.2%
- 155
 
2.5%
C 148
 
2.4%
A 145
 
2.4%
Other values (51) 1558
25.6%
Punctuation
ValueCountFrequency (%)
4
100.0%
Distinct870
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
100.0
14194 
Unknown
3442 
100
2803 
Unknown >> 100.0
 
1345
150.0
 
901
Other values (865)
19812 

Length

Max length68
Median length61
Mean length7.4162411
Min length2

Characters and Unicode

Total characters315168
Distinct characters20
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique147 ?
Unique (%)0.3%

Sample

1st row100
2nd row100
3rd row100
4th row100
5th row100

Common Values

ValueCountFrequency (%)
100.0 14194
33.4%
Unknown 3442
 
8.1%
100 2803
 
6.6%
Unknown >> 100.0 1345
 
3.2%
150.0 901
 
2.1%
90.0 689
 
1.6%
70.0 >> 70.0 683
 
1.6%
70.0 >> 100.0 673
 
1.6%
65; 100 605
 
1.4%
120.0 567
 
1.3%
Other values (860) 16595
39.0%

Length

2023-05-05T12:08:04.660967image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
100.0 18459
26.8%
10957
15.9%
unknown 8536
12.4%
100 6300
 
9.1%
70.0 4118
 
6.0%
150.0 1852
 
2.7%
90.0 1462
 
2.1%
70 1178
 
1.7%
80.0 990
 
1.4%
60 889
 
1.3%
Other values (139) 14202
20.6%

Most occurring characters

ValueCountFrequency (%)
0 104886
33.3%
1 34767
 
11.0%
. 34574
 
11.0%
26446
 
8.4%
n 25608
 
8.1%
> 21450
 
6.8%
5 9189
 
2.9%
w 8536
 
2.7%
o 8536
 
2.7%
k 8536
 
2.7%
Other values (10) 32640
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 168182
53.4%
Lowercase Letter 51216
 
16.3%
Other Punctuation 39106
 
12.4%
Space Separator 26446
 
8.4%
Math Symbol 21682
 
6.9%
Uppercase Letter 8536
 
2.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 104886
62.4%
1 34767
 
20.7%
5 9189
 
5.5%
7 5921
 
3.5%
2 3365
 
2.0%
6 2834
 
1.7%
9 2556
 
1.5%
8 2199
 
1.3%
3 1408
 
0.8%
4 1057
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
n 25608
50.0%
w 8536
 
16.7%
o 8536
 
16.7%
k 8536
 
16.7%
Other Punctuation
ValueCountFrequency (%)
. 34574
88.4%
; 4532
 
11.6%
Math Symbol
ValueCountFrequency (%)
> 21450
98.9%
| 232
 
1.1%
Space Separator
ValueCountFrequency (%)
26446
100.0%
Uppercase Letter
ValueCountFrequency (%)
U 8536
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 255416
81.0%
Latin 59752
 
19.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 104886
41.1%
1 34767
 
13.6%
. 34574
 
13.5%
26446
 
10.4%
> 21450
 
8.4%
5 9189
 
3.6%
7 5921
 
2.3%
; 4532
 
1.8%
2 3365
 
1.3%
6 2834
 
1.1%
Other values (5) 7452
 
2.9%
Latin
ValueCountFrequency (%)
n 25608
42.9%
w 8536
 
14.3%
o 8536
 
14.3%
k 8536
 
14.3%
U 8536
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 315168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 104886
33.3%
1 34767
 
11.0%
. 34574
 
11.0%
26446
 
8.4%
n 25608
 
8.1%
> 21450
 
6.8%
5 9189
 
2.9%
w 8536
 
2.7%
o 8536
 
2.7%
k 8536
 
2.7%
Other values (10) 32640
 
10.4%
Distinct766
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
10.0
7739 
Unknown
4446 
60.0
3851 
30.0
2444 
20.0
 
1588
Other values (761)
22429 

Length

Max length67
Median length59
Mean length7.1448573
Min length3

Characters and Unicode

Total characters303635
Distinct characters20
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique120 ?
Unique (%)0.3%

Sample

1st row10.0
2nd row10.0
3rd row10.0
4th row10.0
5th row10.0

Common Values

ValueCountFrequency (%)
10.0 7739
18.2%
Unknown 4446
 
10.5%
60.0 3851
 
9.1%
30.0 2444
 
5.8%
20.0 1588
 
3.7%
15.0 1571
 
3.7%
5.0 1484
 
3.5%
45.0 1213
 
2.9%
Unknown >> 30.0 868
 
2.0%
Unknown >> 10.0 862
 
2.0%
Other values (756) 16431
38.7%

Length

2023-05-05T12:08:04.851199image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
10.0 12898
18.9%
10655
15.6%
unknown 9800
14.3%
30.0 6711
9.8%
60.0 5199
7.6%
5.0 4155
 
6.1%
15.0 3447
 
5.0%
20.0 3039
 
4.4%
2.0 1794
 
2.6%
1.0 1714
 
2.5%
Other values (104) 8889
13.0%

Most occurring characters

ValueCountFrequency (%)
0 80368
26.5%
. 47846
15.8%
n 29400
 
9.7%
25804
 
8.5%
> 20864
 
6.9%
1 19939
 
6.6%
5 10646
 
3.5%
U 9800
 
3.2%
k 9800
 
3.2%
o 9800
 
3.2%
Other values (10) 39368
13.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 135804
44.7%
Lowercase Letter 58800
19.4%
Other Punctuation 52340
 
17.2%
Space Separator 25804
 
8.5%
Math Symbol 21087
 
6.9%
Uppercase Letter 9800
 
3.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 80368
59.2%
1 19939
 
14.7%
5 10646
 
7.8%
3 7862
 
5.8%
2 6655
 
4.9%
6 5593
 
4.1%
4 2802
 
2.1%
9 1052
 
0.8%
8 495
 
0.4%
7 392
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
n 29400
50.0%
k 9800
 
16.7%
o 9800
 
16.7%
w 9800
 
16.7%
Other Punctuation
ValueCountFrequency (%)
. 47846
91.4%
; 4494
 
8.6%
Math Symbol
ValueCountFrequency (%)
> 20864
98.9%
| 223
 
1.1%
Space Separator
ValueCountFrequency (%)
25804
100.0%
Uppercase Letter
ValueCountFrequency (%)
U 9800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 235035
77.4%
Latin 68600
 
22.6%

Most frequent character per script

Common
ValueCountFrequency (%)
0 80368
34.2%
. 47846
20.4%
25804
 
11.0%
> 20864
 
8.9%
1 19939
 
8.5%
5 10646
 
4.5%
3 7862
 
3.3%
2 6655
 
2.8%
6 5593
 
2.4%
; 4494
 
1.9%
Other values (5) 4964
 
2.1%
Latin
ValueCountFrequency (%)
n 29400
42.9%
U 9800
 
14.3%
k 9800
 
14.3%
o 9800
 
14.3%
w 9800
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 303635
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 80368
26.5%
. 47846
15.8%
n 29400
 
9.7%
25804
 
8.5%
> 20864
 
6.9%
1 19939
 
6.6%
5 10646
 
3.5%
U 9800
 
3.2%
k 9800
 
3.2%
o 9800
 
3.2%
Other values (10) 39368
13.0%

Perovskite_deposition_thermal_annealing_atmosphere
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct31
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
41150 
N2
 
733
Air
 
187
Unknown >> Unknown
 
82
Ambient
 
65
Other values (26)
 
280

Length

Max length56
Median length7
Mean length6.9419488
Min length2

Characters and Unicode

Total characters295012
Distinct characters27
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 41150
96.8%
N2 733
 
1.7%
Air 187
 
0.4%
Unknown >> Unknown 82
 
0.2%
Ambient 65
 
0.2%
N2 >> N2 44
 
0.1%
N2; Ambient 40
 
0.1%
Ar 38
 
0.1%
Air >> Air 28
 
0.1%
Unknown >> Air 18
 
< 0.1%
Other values (21) 112
 
0.3%

Length

2023-05-05T12:08:05.026869image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 41351
95.7%
n2 983
 
2.3%
air 341
 
0.8%
318
 
0.7%
ambient 105
 
0.2%
ar 54
 
0.1%
vacuum 40
 
0.1%
dry 23
 
0.1%
70 3
 
< 0.1%
o2 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
n 124158
42.1%
U 41351
 
14.0%
k 41351
 
14.0%
o 41351
 
14.0%
w 41351
 
14.0%
2 984
 
0.3%
N 983
 
0.3%
722
 
0.2%
> 606
 
0.2%
A 477
 
0.2%
Other values (17) 1678
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 249741
84.7%
Uppercase Letter 42875
 
14.5%
Decimal Number 990
 
0.3%
Space Separator 722
 
0.2%
Math Symbol 621
 
0.2%
Other Punctuation 63
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 124158
49.7%
k 41351
 
16.6%
o 41351
 
16.6%
w 41351
 
16.6%
i 446
 
0.2%
r 418
 
0.2%
m 145
 
0.1%
t 105
 
< 0.1%
b 105
 
< 0.1%
e 105
 
< 0.1%
Other values (4) 206
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
U 41351
96.4%
N 983
 
2.3%
A 477
 
1.1%
V 40
 
0.1%
D 23
 
0.1%
O 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 984
99.4%
7 3
 
0.3%
0 3
 
0.3%
Math Symbol
ValueCountFrequency (%)
> 606
97.6%
| 15
 
2.4%
Space Separator
ValueCountFrequency (%)
722
100.0%
Other Punctuation
ValueCountFrequency (%)
; 63
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 292616
99.2%
Common 2396
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 124158
42.4%
U 41351
 
14.1%
k 41351
 
14.1%
o 41351
 
14.1%
w 41351
 
14.1%
N 983
 
0.3%
A 477
 
0.2%
i 446
 
0.2%
r 418
 
0.1%
m 145
 
< 0.1%
Other values (10) 585
 
0.2%
Common
ValueCountFrequency (%)
2 984
41.1%
722
30.1%
> 606
25.3%
; 63
 
2.6%
| 15
 
0.6%
7 3
 
0.1%
0 3
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 295012
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 124158
42.1%
U 41351
 
14.0%
k 41351
 
14.0%
o 41351
 
14.0%
w 41351
 
14.0%
2 984
 
0.3%
N 983
 
0.3%
722
 
0.2%
> 606
 
0.2%
A 477
 
0.2%
Other values (17) 1678
 
0.6%

Perovskite_deposition_solvent_annealing
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
41607 
True
 
890
ValueCountFrequency (%)
False 41607
97.9%
True 890
 
2.1%
2023-05-05T12:08:05.172988image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Perovskite_deposition_solvent_annealing_solvent_atmosphere
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct38
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
41797 
DMF
 
222
DMSO
 
214
DMF; DMSO
 
29
Vacuum
 
27
Other values (33)
 
208

Length

Max length26
Median length7
Mean length6.9736452
Min length3

Characters and Unicode

Total characters296359
Distinct characters44
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 41797
98.4%
DMF 222
 
0.5%
DMSO 214
 
0.5%
DMF; DMSO 29
 
0.1%
Vacuum 27
 
0.1%
Chlorobenzene; DMF 24
 
0.1%
HCl 21
 
< 0.1%
H2O 19
 
< 0.1%
IPA 17
 
< 0.1%
MACl 15
 
< 0.1%
Other values (28) 112
 
0.3%

Length

2023-05-05T12:08:05.299014image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 41797
98.1%
dmf 282
 
0.7%
dmso 263
 
0.6%
chlorobenzene 42
 
0.1%
vacuum 27
 
0.1%
h2o 24
 
0.1%
ipa 24
 
0.1%
hcl 21
 
< 0.1%
macl 15
 
< 0.1%
air 10
 
< 0.1%
Other values (20) 89
 
0.2%

Most occurring characters

ValueCountFrequency (%)
n 125545
42.4%
o 41928
 
14.1%
U 41797
 
14.1%
k 41797
 
14.1%
w 41797
 
14.1%
M 583
 
0.2%
D 552
 
0.2%
O 287
 
0.1%
F 282
 
0.1%
S 263
 
0.1%
Other values (34) 1528
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 252049
85.0%
Uppercase Letter 44091
 
14.9%
Space Separator 97
 
< 0.1%
Other Punctuation 88
 
< 0.1%
Decimal Number 30
 
< 0.1%
Dash Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 125545
49.8%
o 41928
 
16.6%
k 41797
 
16.6%
w 41797
 
16.6%
e 207
 
0.1%
l 147
 
0.1%
h 84
 
< 0.1%
a 81
 
< 0.1%
r 67
 
< 0.1%
u 65
 
< 0.1%
Other values (11) 331
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
U 41797
94.8%
M 583
 
1.3%
D 552
 
1.3%
O 287
 
0.7%
F 282
 
0.6%
S 263
 
0.6%
C 82
 
0.2%
A 54
 
0.1%
H 50
 
0.1%
P 34
 
0.1%
Other values (8) 107
 
0.2%
Decimal Number
ValueCountFrequency (%)
2 24
80.0%
4 6
 
20.0%
Space Separator
ValueCountFrequency (%)
97
100.0%
Other Punctuation
ValueCountFrequency (%)
; 88
100.0%
Dash Punctuation
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 296140
99.9%
Common 219
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 125545
42.4%
o 41928
 
14.2%
U 41797
 
14.1%
k 41797
 
14.1%
w 41797
 
14.1%
M 583
 
0.2%
D 552
 
0.2%
O 287
 
0.1%
F 282
 
0.1%
S 263
 
0.1%
Other values (29) 1309
 
0.4%
Common
ValueCountFrequency (%)
97
44.3%
; 88
40.2%
2 24
 
11.0%
4 6
 
2.7%
4
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 296355
> 99.9%
Punctuation 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 125545
42.4%
o 41928
 
14.1%
U 41797
 
14.1%
k 41797
 
14.1%
w 41797
 
14.1%
M 583
 
0.2%
D 552
 
0.2%
O 287
 
0.1%
F 282
 
0.1%
S 263
 
0.1%
Other values (33) 1524
 
0.5%
Punctuation
ValueCountFrequency (%)
4
100.0%

Perovskite_deposition_after_treatment_of_formed_perovskite
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct103
Distinct (%)6.3%
Missing40873
Missing (%)96.2%
Memory size332.1 KiB
Washed with IPA
1078 
Dried under flow of clean air
 
51
UV radiation
 
29
Washed with Methyl acetate
 
23
Washed with IPA >> Washed with Dichloromethane
 
21
Other values (98)
422 

Length

Max length106
Median length15
Mean length18.455049
Min length4

Characters and Unicode

Total characters29971
Distinct characters61
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)1.4%

Sample

1st rowWashed with IPA
2nd rowWashed with IPA
3rd rowWashed with IPA
4th rowWashed with IPA
5th rowWashed with IPA

Common Values

ValueCountFrequency (%)
Washed with IPA 1078
 
2.5%
Dried under flow of clean air 51
 
0.1%
UV radiation 29
 
0.1%
Washed with Methyl acetate 23
 
0.1%
Washed with IPA >> Washed with Dichloromethane 21
 
< 0.1%
Ultrasonic transducer 16
 
< 0.1%
Light soaking 14
 
< 0.1%
Intense pulsed light annealing 13
 
< 0.1%
Washed with Ether 13
 
< 0.1%
Washed with Toluene 13
 
< 0.1%
Other values (93) 353
 
0.8%
(Missing) 40873
96.2%

Length

2023-05-05T12:08:05.458083image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
with 1252
23.5%
washed 1240
23.3%
ipa 1132
21.3%
in 73
 
1.4%
of 71
 
1.3%
annealing 71
 
1.3%
under 67
 
1.3%
flow 66
 
1.2%
air 64
 
1.2%
radiation 64
 
1.2%
Other values (152) 1219
22.9%

Most occurring characters

ValueCountFrequency (%)
3705
12.4%
h 2738
 
9.1%
e 2326
 
7.8%
i 2170
 
7.2%
a 2168
 
7.2%
t 1927
 
6.4%
d 1642
 
5.5%
s 1557
 
5.2%
w 1350
 
4.5%
W 1240
 
4.1%
Other values (51) 9148
30.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20546
68.6%
Uppercase Letter 5465
 
18.2%
Space Separator 3705
 
12.4%
Math Symbol 120
 
0.4%
Decimal Number 55
 
0.2%
Dash Punctuation 43
 
0.1%
Other Punctuation 23
 
0.1%
Close Punctuation 7
 
< 0.1%
Open Punctuation 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
h 2738
13.3%
e 2326
11.3%
i 2170
10.6%
a 2168
10.6%
t 1927
9.4%
d 1642
8.0%
s 1557
7.6%
w 1350
6.6%
n 1042
 
5.1%
o 708
 
3.4%
Other values (14) 2918
14.2%
Uppercase Letter
ValueCountFrequency (%)
W 1240
22.7%
I 1208
22.1%
A 1205
22.0%
P 1159
21.2%
D 151
 
2.8%
S 66
 
1.2%
M 61
 
1.1%
U 56
 
1.0%
E 47
 
0.9%
V 45
 
0.8%
Other values (10) 227
 
4.2%
Decimal Number
ValueCountFrequency (%)
2 25
45.5%
1 11
20.0%
5 10
 
18.2%
3 4
 
7.3%
0 4
 
7.3%
4 1
 
1.8%
Other Punctuation
ValueCountFrequency (%)
/ 7
30.4%
@ 7
30.4%
: 4
17.4%
; 2
 
8.7%
, 2
 
8.7%
. 1
 
4.3%
Space Separator
ValueCountFrequency (%)
3705
100.0%
Math Symbol
ValueCountFrequency (%)
> 120
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 43
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 26011
86.8%
Common 3960
 
13.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
h 2738
 
10.5%
e 2326
 
8.9%
i 2170
 
8.3%
a 2168
 
8.3%
t 1927
 
7.4%
d 1642
 
6.3%
s 1557
 
6.0%
w 1350
 
5.2%
W 1240
 
4.8%
I 1208
 
4.6%
Other values (34) 7685
29.5%
Common
ValueCountFrequency (%)
3705
93.6%
> 120
 
3.0%
- 43
 
1.1%
2 25
 
0.6%
1 11
 
0.3%
5 10
 
0.3%
) 7
 
0.2%
/ 7
 
0.2%
( 7
 
0.2%
@ 7
 
0.2%
Other values (7) 18
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29971
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3705
12.4%
h 2738
 
9.1%
e 2326
 
7.8%
i 2170
 
7.2%
a 2168
 
7.2%
t 1927
 
6.4%
d 1642
 
5.5%
s 1557
 
5.2%
w 1350
 
4.5%
W 1240
 
4.1%
Other values (51) 9148
30.5%

Perovskite_surface_treatment_before_next_deposition_step
Categorical

HIGH CORRELATION  MISSING  UNIFORM 

Distinct2
Distinct (%)50.0%
Missing42493
Missing (%)> 99.9%
Memory size332.1 KiB
Ar plasma
UV

Length

Max length9
Median length5.5
Mean length5.5
Min length2

Characters and Unicode

Total characters22
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAr plasma
2nd rowAr plasma
3rd rowUV
4th rowUV

Common Values

ValueCountFrequency (%)
Ar plasma 2
 
< 0.1%
UV 2
 
< 0.1%
(Missing) 42493
> 99.9%

Length

2023-05-05T12:08:05.608778image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-05T12:08:05.757345image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
ar 2
33.3%
plasma 2
33.3%
uv 2
33.3%

Most occurring characters

ValueCountFrequency (%)
a 4
18.2%
A 2
9.1%
r 2
9.1%
2
9.1%
p 2
9.1%
l 2
9.1%
s 2
9.1%
m 2
9.1%
U 2
9.1%
V 2
9.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 14
63.6%
Uppercase Letter 6
27.3%
Space Separator 2
 
9.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4
28.6%
r 2
14.3%
p 2
14.3%
l 2
14.3%
s 2
14.3%
m 2
14.3%
Uppercase Letter
ValueCountFrequency (%)
A 2
33.3%
U 2
33.3%
V 2
33.3%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20
90.9%
Common 2
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4
20.0%
A 2
10.0%
r 2
10.0%
p 2
10.0%
l 2
10.0%
s 2
10.0%
m 2
10.0%
U 2
10.0%
V 2
10.0%
Common
ValueCountFrequency (%)
2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 22
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 4
18.2%
A 2
9.1%
r 2
9.1%
2
9.1%
p 2
9.1%
l 2
9.1%
s 2
9.1%
m 2
9.1%
U 2
9.1%
V 2
9.1%

HTL_stack_sequence
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct1959
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Spiro-MeOTAD
20918 
PEDOT:PSS
6591 
none
2626 
PTAA
 
1854
NiO-c
 
1700
Other values (1954)
8808 

Length

Max length231
Median length183
Mean length10.144904
Min length1

Characters and Unicode

Total characters431128
Distinct characters83
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique928 ?
Unique (%)2.2%

Sample

1st rowSpiro-MeOTAD
2nd rowSpiro-MeOTAD
3rd rowSpiro-MeOTAD
4th rowSpiro-MeOTAD
5th rowSpiro-MeOTAD

Common Values

ValueCountFrequency (%)
Spiro-MeOTAD 20918
49.2%
PEDOT:PSS 6591
 
15.5%
none 2626
 
6.2%
PTAA 1854
 
4.4%
NiO-c 1700
 
4.0%
P3HT 885
 
2.1%
NiO-np 411
 
1.0%
CuSCN 220
 
0.5%
NiMgLiO 160
 
0.4%
NiO 160
 
0.4%
Other values (1949) 6972
 
16.4%

Length

2023-05-05T12:08:05.906179image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
spiro-meotad 21617
45.3%
pedot:pss 7214
 
15.1%
none 2626
 
5.5%
ptaa 2167
 
4.5%
2163
 
4.5%
nio-c 1985
 
4.2%
p3ht 988
 
2.1%
nio-np 476
 
1.0%
cuscn 267
 
0.6%
moo3 214
 
0.4%
Other values (1730) 8010
 
16.8%

Most occurring characters

ValueCountFrequency (%)
S 37397
 
8.7%
T 35663
 
8.3%
O 33382
 
7.7%
D 30380
 
7.0%
- 28268
 
6.6%
A 27443
 
6.4%
e 26831
 
6.2%
i 26520
 
6.2%
o 26294
 
6.1%
p 23343
 
5.4%
Other values (73) 135607
31.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 230202
53.4%
Lowercase Letter 147843
34.3%
Dash Punctuation 28358
 
6.6%
Other Punctuation 8984
 
2.1%
Decimal Number 6982
 
1.6%
Space Separator 5228
 
1.2%
Math Symbol 2145
 
0.5%
Open Punctuation 691
 
0.2%
Close Punctuation 684
 
0.2%
Final Punctuation 5
 
< 0.1%
Other values (2) 6
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 37397
16.2%
T 35663
15.5%
O 33382
14.5%
D 30380
13.2%
A 27443
11.9%
M 23038
10.0%
P 21598
9.4%
E 7684
 
3.3%
N 4458
 
1.9%
C 2795
 
1.2%
Other values (16) 6364
 
2.8%
Lowercase Letter
ValueCountFrequency (%)
e 26831
18.1%
i 26520
17.9%
o 26294
17.8%
p 23343
15.8%
r 22779
15.4%
n 7704
 
5.2%
c 2763
 
1.9%
l 1468
 
1.0%
a 1405
 
1.0%
h 1275
 
0.9%
Other values (15) 7461
 
5.0%
Decimal Number
ValueCountFrequency (%)
3 2251
32.2%
2 1581
22.6%
1 877
 
12.6%
4 678
 
9.7%
0 376
 
5.4%
5 343
 
4.9%
6 325
 
4.7%
7 212
 
3.0%
9 198
 
2.8%
8 141
 
2.0%
Other Punctuation
ValueCountFrequency (%)
: 7399
82.4%
, 720
 
8.0%
; 500
 
5.6%
' 160
 
1.8%
86
 
1.0%
@ 67
 
0.7%
. 52
 
0.6%
Dash Punctuation
ValueCountFrequency (%)
- 28268
99.7%
87
 
0.3%
3
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 496
71.8%
[ 192
 
27.8%
{ 3
 
0.4%
Close Punctuation
ValueCountFrequency (%)
) 492
71.9%
] 189
 
27.6%
} 3
 
0.4%
Modifier Symbol
ValueCountFrequency (%)
´ 1
50.0%
` 1
50.0%
Space Separator
ValueCountFrequency (%)
5228
100.0%
Math Symbol
ValueCountFrequency (%)
| 2145
100.0%
Final Punctuation
ValueCountFrequency (%)
5
100.0%
Control
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 378045
87.7%
Common 53083
 
12.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 37397
9.9%
T 35663
9.4%
O 33382
 
8.8%
D 30380
 
8.0%
A 27443
 
7.3%
e 26831
 
7.1%
i 26520
 
7.0%
o 26294
 
7.0%
p 23343
 
6.2%
M 23038
 
6.1%
Other values (41) 87754
23.2%
Common
ValueCountFrequency (%)
- 28268
53.3%
: 7399
 
13.9%
5228
 
9.8%
3 2251
 
4.2%
| 2145
 
4.0%
2 1581
 
3.0%
1 877
 
1.7%
, 720
 
1.4%
4 678
 
1.3%
; 500
 
0.9%
Other values (22) 3436
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 430946
> 99.9%
Punctuation 181
 
< 0.1%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 37397
 
8.7%
T 35663
 
8.3%
O 33382
 
7.7%
D 30380
 
7.0%
- 28268
 
6.6%
A 27443
 
6.4%
e 26831
 
6.2%
i 26520
 
6.2%
o 26294
 
6.1%
p 23343
 
5.4%
Other values (68) 135425
31.4%
Punctuation
ValueCountFrequency (%)
87
48.1%
86
47.5%
5
 
2.8%
3
 
1.7%
None
ValueCountFrequency (%)
´ 1
100.0%

HTL_thickness_list
Categorical

HIGH CARDINALITY  MISSING 

Distinct439
Distinct (%)4.4%
Missing32425
Missing (%)76.3%
Memory size332.1 KiB
200.0
1132 
40.0
936 
150.0
704 
30.0
682 
100.0
 
550
Other values (434)
6068 

Length

Max length16
Median length15
Mean length5.0764496
Min length3

Characters and Unicode

Total characters51130
Distinct characters22
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique181 ?
Unique (%)1.8%

Sample

1st row50.0
2nd row50.0
3rd row50.0
4th row50.0
5th row70.0 | nan

Common Values

ValueCountFrequency (%)
200.0 1132
 
2.7%
40.0 936
 
2.2%
150.0 704
 
1.7%
30.0 682
 
1.6%
100.0 550
 
1.3%
50.0 496
 
1.2%
20.0 487
 
1.1%
250.0 294
 
0.7%
35.0 227
 
0.5%
60.0 226
 
0.5%
Other values (429) 4338
 
10.2%
(Missing) 32425
76.3%

Length

2023-05-05T12:08:06.057291image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
200.0 1187
 
9.9%
40.0 1027
 
8.6%
944
 
7.9%
150.0 738
 
6.2%
30.0 725
 
6.1%
20.0 593
 
5.0%
100.0 581
 
4.9%
50.0 545
 
4.6%
10.0 516
 
4.3%
nan 482
 
4.0%
Other values (250) 4622
38.6%

Most occurring characters

ValueCountFrequency (%)
0 21377
41.8%
. 10527
20.6%
1 3363
 
6.6%
2 3282
 
6.4%
5 3050
 
6.0%
1888
 
3.7%
3 1630
 
3.2%
4 1535
 
3.0%
n 976
 
1.9%
| 944
 
1.8%
Other values (12) 2558
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 36285
71.0%
Other Punctuation 10527
 
20.6%
Space Separator 1888
 
3.7%
Lowercase Letter 1483
 
2.9%
Math Symbol 944
 
1.8%
Uppercase Letter 3
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 21377
58.9%
1 3363
 
9.3%
2 3282
 
9.0%
5 3050
 
8.4%
3 1630
 
4.5%
4 1535
 
4.2%
8 682
 
1.9%
6 594
 
1.6%
7 514
 
1.4%
9 258
 
0.7%
Lowercase Letter
ValueCountFrequency (%)
n 976
65.8%
a 482
32.5%
u 7
 
0.5%
k 4
 
0.3%
o 4
 
0.3%
w 4
 
0.3%
r 3
 
0.2%
e 3
 
0.2%
Other Punctuation
ValueCountFrequency (%)
. 10527
100.0%
Space Separator
ValueCountFrequency (%)
1888
100.0%
Math Symbol
ValueCountFrequency (%)
| 944
100.0%
Uppercase Letter
ValueCountFrequency (%)
T 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 49644
97.1%
Latin 1486
 
2.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 21377
43.1%
. 10527
21.2%
1 3363
 
6.8%
2 3282
 
6.6%
5 3050
 
6.1%
1888
 
3.8%
3 1630
 
3.3%
4 1535
 
3.1%
| 944
 
1.9%
8 682
 
1.4%
Other values (3) 1366
 
2.8%
Latin
ValueCountFrequency (%)
n 976
65.7%
a 482
32.4%
u 7
 
0.5%
k 4
 
0.3%
o 4
 
0.3%
w 4
 
0.3%
T 3
 
0.2%
r 3
 
0.2%
e 3
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51130
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 21377
41.8%
. 10527
20.6%
1 3363
 
6.6%
2 3282
 
6.4%
5 3050
 
6.0%
1888
 
3.7%
3 1630
 
3.2%
4 1535
 
3.0%
n 976
 
1.9%
| 944
 
1.8%
Other values (12) 2558
 
5.0%

HTL_additives_compounds
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct380
Distinct (%)1.6%
Missing18371
Missing (%)43.2%
Memory size332.1 KiB
Li-TFSI; TBP
16635 
FK209; Li-TFSI; TBP
3475 
Undoped
 
422
FK102; Li-TFSI; TBP
 
388
F4-TCNQ
 
322
Other values (375)
2884 

Length

Max length71
Median length12
Mean length13.158543
Min length1

Characters and Unicode

Total characters317463
Distinct characters74
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique120 ?
Unique (%)0.5%

Sample

1st rowLi(CF3SO2)2N; TBP
2nd rowLi(CF3SO2)2N; TBP
3rd rowLi(CF3SO2)2N; TBP
4th rowLi(CF3SO2)2N; TBP
5th rowLi(CF3SO2)2N; TBP

Common Values

ValueCountFrequency (%)
Li-TFSI; TBP 16635
39.1%
FK209; Li-TFSI; TBP 3475
 
8.2%
Undoped 422
 
1.0%
FK102; Li-TFSI; TBP 388
 
0.9%
F4-TCNQ 322
 
0.8%
Unknown | Li-TFSI; TBP 243
 
0.6%
Li-TFSI 209
 
0.5%
Cu 202
 
0.5%
Unknown | FK209; Li-TFSI; TBP 108
 
0.3%
Li-TFSI; TBP; FK209 102
 
0.2%
Other values (370) 2020
 
4.8%
(Missing) 18371
43.2%

Length

2023-05-05T12:08:06.226227image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
li-tfsi 21799
41.8%
tbp 21653
41.5%
fk209 3787
 
7.3%
774
 
1.5%
unknown 650
 
1.2%
undoped 533
 
1.0%
fk102 436
 
0.8%
f4-tcnq 379
 
0.7%
cu 239
 
0.5%
co 66
 
0.1%
Other values (304) 1801
 
3.5%

Most occurring characters

ValueCountFrequency (%)
T 44306
14.0%
27991
8.8%
F 26859
8.5%
; 26258
8.3%
- 22677
 
7.1%
i 22414
 
7.1%
S 22161
 
7.0%
I 22036
 
6.9%
P 22034
 
6.9%
L 21971
 
6.9%
Other values (64) 58756
18.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 190139
59.9%
Lowercase Letter 35035
 
11.0%
Space Separator 27991
 
8.8%
Other Punctuation 26364
 
8.3%
Dash Punctuation 22677
 
7.1%
Decimal Number 14009
 
4.4%
Math Symbol 780
 
0.2%
Open Punctuation 234
 
0.1%
Close Punctuation 234
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 44306
23.3%
F 26859
14.1%
S 22161
11.7%
I 22036
11.6%
P 22034
11.6%
L 21971
11.6%
B 21874
11.5%
K 4280
 
2.3%
C 1192
 
0.6%
U 1191
 
0.6%
Other values (16) 2235
 
1.2%
Lowercase Letter
ValueCountFrequency (%)
i 22414
64.0%
n 2970
 
8.5%
o 1801
 
5.1%
d 1324
 
3.8%
e 1081
 
3.1%
p 763
 
2.2%
k 668
 
1.9%
w 650
 
1.9%
u 486
 
1.4%
a 472
 
1.3%
Other values (14) 2406
 
6.9%
Decimal Number
ValueCountFrequency (%)
2 4606
32.9%
0 4265
30.4%
9 3822
27.3%
4 482
 
3.4%
1 472
 
3.4%
3 225
 
1.6%
6 87
 
0.6%
5 36
 
0.3%
8 14
 
0.1%
Other Punctuation
ValueCountFrequency (%)
; 26258
99.6%
, 47
 
0.2%
@ 30
 
0.1%
18
 
0.1%
: 7
 
< 0.1%
. 3
 
< 0.1%
* 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
| 774
99.2%
+ 6
 
0.8%
Open Punctuation
ValueCountFrequency (%)
( 227
97.0%
[ 7
 
3.0%
Close Punctuation
ValueCountFrequency (%)
) 227
97.0%
] 7
 
3.0%
Space Separator
ValueCountFrequency (%)
27991
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 22677
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 225174
70.9%
Common 92289
29.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 44306
19.7%
F 26859
11.9%
i 22414
10.0%
S 22161
9.8%
I 22036
9.8%
P 22034
9.8%
L 21971
9.8%
B 21874
9.7%
K 4280
 
1.9%
n 2970
 
1.3%
Other values (40) 14269
 
6.3%
Common
ValueCountFrequency (%)
27991
30.3%
; 26258
28.5%
- 22677
24.6%
2 4606
 
5.0%
0 4265
 
4.6%
9 3822
 
4.1%
| 774
 
0.8%
4 482
 
0.5%
1 472
 
0.5%
( 227
 
0.2%
Other values (14) 715
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 317445
> 99.9%
Punctuation 18
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 44306
14.0%
27991
8.8%
F 26859
8.5%
; 26258
8.3%
- 22677
 
7.1%
i 22414
 
7.1%
S 22161
 
7.0%
I 22036
 
6.9%
P 22034
 
6.9%
L 21971
 
6.9%
Other values (63) 58738
18.5%
Punctuation
ValueCountFrequency (%)
18
100.0%

HTL_additives_concentrations
Categorical

HIGH CARDINALITY  MISSING 

Distinct223
Distinct (%)21.5%
Missing41462
Missing (%)97.6%
Memory size332.1 KiB
9.1 mg/ml; 0.029 ml/ml
 
51
14.2 mM; 8 vol%
 
40
17.5 uL(520mg/mLACN); 28.8 uL
 
34
nan | 2 uL/mL
 
30
9.1 mg/ml; 28.8 µl/ml
 
25
Other values (218)
855 

Length

Max length52
Median length35
Mean length20.761353
Min length3

Characters and Unicode

Total characters21488
Distinct characters36
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique73 ?
Unique (%)7.1%

Sample

1st row2 wt%
2nd row5 wt%
3rd row10 wt%
4th row0.5 vol%; nan; nan
5th row1 vol%; nan; nan

Common Values

ValueCountFrequency (%)
9.1 mg/ml; 0.029 ml/ml 51
 
0.1%
14.2 mM; 8 vol% 40
 
0.1%
17.5 uL(520mg/mLACN); 28.8 uL 34
 
0.1%
nan | 2 uL/mL 30
 
0.1%
9.1 mg/ml; 28.8 µl/ml 25
 
0.1%
9.1 mg/ml; 0.0288 ml/ml 22
 
0.1%
520 mg/ml; 0.0338 vol% 20
 
< 0.1%
5.2 mg/ml; 0.02 ml/ml 17
 
< 0.1%
22.5 uL; 15 uL 16
 
< 0.1%
520 mg/ml; 2.88 vol% 15
 
< 0.1%
Other values (213) 765
 
1.8%
(Missing) 41462
97.6%

Length

2023-05-05T12:08:06.611627image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mg/ml 438
 
10.4%
vol 345
 
8.2%
ul 216
 
5.1%
mm 164
 
3.9%
ml/ml 154
 
3.7%
9.1 147
 
3.5%
nan 144
 
3.4%
28.8 125
 
3.0%
107
 
2.5%
µl/ml 107
 
2.5%
Other values (183) 2271
53.8%

Most occurring characters

ValueCountFrequency (%)
3183
14.8%
m 2062
 
9.6%
0 1364
 
6.3%
l 1357
 
6.3%
. 1264
 
5.9%
; 1077
 
5.0%
/ 995
 
4.6%
2 946
 
4.4%
L 809
 
3.8%
1 715
 
3.3%
Other values (26) 7716
35.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6084
28.3%
Decimal Number 5959
27.7%
Other Punctuation 3871
18.0%
Space Separator 3183
14.8%
Uppercase Letter 1813
 
8.4%
Open Punctuation 261
 
1.2%
Close Punctuation 261
 
1.2%
Math Symbol 56
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m 2062
33.9%
l 1357
22.3%
g 696
 
11.4%
u 523
 
8.6%
o 406
 
6.7%
v 345
 
5.7%
n 288
 
4.7%
a 150
 
2.5%
µ 107
 
1.8%
t 78
 
1.3%
Decimal Number
ValueCountFrequency (%)
0 1364
22.9%
2 946
15.9%
1 715
12.0%
5 711
11.9%
8 708
11.9%
3 487
 
8.2%
9 342
 
5.7%
7 295
 
5.0%
4 206
 
3.5%
6 185
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
L 809
44.6%
C 249
 
13.7%
A 247
 
13.6%
N 247
 
13.6%
M 245
 
13.5%
K 14
 
0.8%
B 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 1264
32.7%
; 1077
27.8%
/ 995
25.7%
% 535
13.8%
Space Separator
ValueCountFrequency (%)
3183
100.0%
Open Punctuation
ValueCountFrequency (%)
( 261
100.0%
Close Punctuation
ValueCountFrequency (%)
) 261
100.0%
Math Symbol
ValueCountFrequency (%)
| 56
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13698
63.7%
Latin 7790
36.3%

Most frequent character per script

Common
ValueCountFrequency (%)
3183
23.2%
0 1364
10.0%
. 1264
 
9.2%
; 1077
 
7.9%
/ 995
 
7.3%
2 946
 
6.9%
1 715
 
5.2%
5 711
 
5.2%
8 708
 
5.2%
% 535
 
3.9%
Other values (9) 2200
16.1%
Latin
ValueCountFrequency (%)
m 2062
26.5%
l 1357
17.4%
L 809
 
10.4%
g 696
 
8.9%
u 523
 
6.7%
o 406
 
5.2%
v 345
 
4.4%
n 288
 
3.7%
C 249
 
3.2%
A 247
 
3.2%
Other values (7) 808
 
10.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21381
99.5%
None 107
 
0.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3183
14.9%
m 2062
 
9.6%
0 1364
 
6.4%
l 1357
 
6.3%
. 1264
 
5.9%
; 1077
 
5.0%
/ 995
 
4.7%
2 946
 
4.4%
L 809
 
3.8%
1 715
 
3.3%
Other values (25) 7609
35.6%
None
ValueCountFrequency (%)
µ 107
100.0%

HTL_deposition_procedure
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct120
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Spin-coating
35729 
Unknown
 
2906
Spin-coating | Spin-coating
 
1342
Evaporation
 
351
Spray-pyrolys
 
256
Other values (115)
 
1913

Length

Max length91
Median length12
Mean length12.462268
Min length3

Characters and Unicode

Total characters529609
Distinct characters47
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)< 0.1%

Sample

1st rowSpin-coating
2nd rowSpin-coating
3rd rowSpin-coating
4th rowSpin-coating
5th rowSpin-coating

Common Values

ValueCountFrequency (%)
Spin-coating 35729
84.1%
Unknown 2906
 
6.8%
Spin-coating | Spin-coating 1342
 
3.2%
Evaporation 351
 
0.8%
Spray-pyrolys 256
 
0.6%
Doctor blading 181
 
0.4%
Spin-coating | Evaporation 146
 
0.3%
Evaporation | Evaporation 141
 
0.3%
Spray-coating 118
 
0.3%
Sputtering 106
 
0.2%
Other values (110) 1221
 
2.9%

Length

2023-05-05T12:08:06.788743image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
spin-coating 39035
82.1%
unknown 2927
 
6.2%
2167
 
4.6%
evaporation 950
 
2.0%
spray-pyrolys 285
 
0.6%
sputtering 231
 
0.5%
blading 204
 
0.4%
doctor 204
 
0.4%
spray-coating 163
 
0.3%
printing 88
 
0.2%
Other values (58) 1272
 
2.7%

Most occurring characters

ValueCountFrequency (%)
n 89582
16.9%
i 80929
15.3%
o 45819
8.7%
a 42565
8.0%
t 41837
7.9%
p 41578
7.9%
g 40050
7.6%
c 39946
7.5%
S 39797
7.5%
- 39757
7.5%
Other values (37) 27749
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 437556
82.6%
Uppercase Letter 45036
 
8.5%
Dash Punctuation 39757
 
7.5%
Space Separator 5029
 
0.9%
Math Symbol 2231
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 89582
20.5%
i 80929
18.5%
o 45819
10.5%
a 42565
9.7%
t 41837
9.6%
p 41578
9.5%
g 40050
9.2%
c 39946
9.1%
r 3002
 
0.7%
k 2934
 
0.7%
Other values (16) 9314
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
S 39797
88.4%
U 2977
 
6.6%
E 1119
 
2.5%
D 554
 
1.2%
L 154
 
0.3%
A 105
 
0.2%
C 72
 
0.2%
R 55
 
0.1%
B 42
 
0.1%
F 39
 
0.1%
Other values (7) 122
 
0.3%
Math Symbol
ValueCountFrequency (%)
| 2103
94.3%
> 128
 
5.7%
Dash Punctuation
ValueCountFrequency (%)
- 39757
100.0%
Space Separator
ValueCountFrequency (%)
5029
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 482592
91.1%
Common 47017
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 89582
18.6%
i 80929
16.8%
o 45819
9.5%
a 42565
8.8%
t 41837
8.7%
p 41578
8.6%
g 40050
8.3%
c 39946
8.3%
S 39797
8.2%
r 3002
 
0.6%
Other values (33) 17487
 
3.6%
Common
ValueCountFrequency (%)
- 39757
84.6%
5029
 
10.7%
| 2103
 
4.5%
> 128
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 529609
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 89582
16.9%
i 80929
15.3%
o 45819
8.7%
a 42565
8.0%
t 41837
7.9%
p 41578
7.9%
g 40050
7.6%
c 39946
7.5%
S 39797
7.5%
- 39757
7.5%
Other values (37) 27749
 
5.2%

HTL_deposition_synthesis_atmosphere
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
41362 
N2
 
634
Air
 
310
Ambient
 
47
Ar
 
30
Other values (14)
 
114

Length

Max length15
Median length7
Mean length6.8972633
Min length2

Characters and Unicode

Total characters293113
Distinct characters27
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 41362
97.3%
N2 634
 
1.5%
Air 310
 
0.7%
Ambient 47
 
0.1%
Ar 30
 
0.1%
Dry air 26
 
0.1%
N2 | Vacuum 18
 
< 0.1%
Air | N2 13
 
< 0.1%
Ar; O2 10
 
< 0.1%
N2 | N2 10
 
< 0.1%
Other values (9) 37
 
0.1%

Length

2023-05-05T12:08:06.942686image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 41362
96.9%
n2 691
 
1.6%
air 362
 
0.8%
68
 
0.2%
vacuum 52
 
0.1%
ar 49
 
0.1%
ambient 47
 
0.1%
dry 26
 
0.1%
o2 10
 
< 0.1%
methanol 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
n 124135
42.4%
o 41364
 
14.1%
U 41362
 
14.1%
k 41362
 
14.1%
w 41362
 
14.1%
2 701
 
0.2%
N 691
 
0.2%
r 437
 
0.1%
A 432
 
0.1%
i 409
 
0.1%
Other values (17) 858
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 249581
85.1%
Uppercase Letter 42573
 
14.5%
Decimal Number 701
 
0.2%
Space Separator 172
 
0.1%
Math Symbol 76
 
< 0.1%
Other Punctuation 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 124135
49.7%
o 41364
 
16.6%
k 41362
 
16.6%
w 41362
 
16.6%
r 437
 
0.2%
i 409
 
0.2%
u 104
 
< 0.1%
m 101
 
< 0.1%
a 80
 
< 0.1%
c 52
 
< 0.1%
Other values (6) 175
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
U 41362
97.2%
N 691
 
1.6%
A 432
 
1.0%
V 52
 
0.1%
D 26
 
0.1%
O 10
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
| 60
78.9%
> 16
 
21.1%
Decimal Number
ValueCountFrequency (%)
2 701
100.0%
Space Separator
ValueCountFrequency (%)
172
100.0%
Other Punctuation
ValueCountFrequency (%)
; 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 292154
99.7%
Common 959
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 124135
42.5%
o 41364
 
14.2%
U 41362
 
14.2%
k 41362
 
14.2%
w 41362
 
14.2%
N 691
 
0.2%
r 437
 
0.1%
A 432
 
0.1%
i 409
 
0.1%
u 104
 
< 0.1%
Other values (12) 496
 
0.2%
Common
ValueCountFrequency (%)
2 701
73.1%
172
 
17.9%
| 60
 
6.3%
> 16
 
1.7%
; 10
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 293113
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 124135
42.4%
o 41364
 
14.1%
U 41362
 
14.1%
k 41362
 
14.1%
w 41362
 
14.1%
2 701
 
0.2%
N 691
 
0.2%
r 437
 
0.1%
A 432
 
0.1%
i 409
 
0.1%
Other values (17) 858
 
0.3%

HTL_deposition_solvents
Categorical

HIGH CARDINALITY  HIGH CORRELATION  IMBALANCE 

Distinct51
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
40805 
Chlorobenzene
 
1114
Toluene
 
128
Water
 
94
Ethanol
 
38
Other values (46)
 
318

Length

Max length37
Median length7
Mean length7.2251453
Min length3

Characters and Unicode

Total characters307047
Distinct characters45
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 40805
96.0%
Chlorobenzene 1114
 
2.6%
Toluene 128
 
0.3%
Water 94
 
0.2%
Ethanol 38
 
0.1%
Toluene | Methanol 31
 
0.1%
IPA; Water 22
 
0.1%
Chlorobenzene | none 21
 
< 0.1%
IPA 17
 
< 0.1%
Diethyl sulfide 16
 
< 0.1%
Other values (41) 211
 
0.5%

Length

2023-05-05T12:08:07.095308image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 40822
95.1%
chlorobenzene 1215
 
2.8%
toluene 161
 
0.4%
150
 
0.3%
water 146
 
0.3%
ipa 69
 
0.2%
ethanol 61
 
0.1%
none 53
 
0.1%
methanol 40
 
0.1%
acetonitrile 21
 
< 0.1%
Other values (22) 179
 
0.4%

Most occurring characters

ValueCountFrequency (%)
n 125382
40.8%
o 43759
 
14.3%
w 40822
 
13.3%
U 40822
 
13.3%
k 40822
 
13.3%
e 4437
 
1.4%
l 1669
 
0.5%
r 1425
 
0.5%
h 1414
 
0.5%
C 1227
 
0.4%
Other values (35) 5268
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 263514
85.8%
Uppercase Letter 42835
 
14.0%
Space Separator 420
 
0.1%
Math Symbol 152
 
< 0.1%
Other Punctuation 74
 
< 0.1%
Decimal Number 27
 
< 0.1%
Dash Punctuation 25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 125382
47.6%
o 43759
 
16.6%
w 40822
 
15.5%
k 40822
 
15.5%
e 4437
 
1.7%
l 1669
 
0.6%
r 1425
 
0.5%
h 1414
 
0.5%
z 1224
 
0.5%
b 1224
 
0.5%
Other values (13) 1336
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
U 40822
95.3%
C 1227
 
2.9%
T 186
 
0.4%
W 146
 
0.3%
A 84
 
0.2%
E 77
 
0.2%
P 69
 
0.2%
I 69
 
0.2%
M 67
 
0.2%
D 36
 
0.1%
Other values (4) 52
 
0.1%
Math Symbol
ValueCountFrequency (%)
| 148
97.4%
> 4
 
2.6%
Other Punctuation
ValueCountFrequency (%)
; 72
97.3%
, 2
 
2.7%
Decimal Number
ValueCountFrequency (%)
2 25
92.6%
1 2
 
7.4%
Space Separator
ValueCountFrequency (%)
420
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 306349
99.8%
Common 698
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 125382
40.9%
o 43759
 
14.3%
w 40822
 
13.3%
U 40822
 
13.3%
k 40822
 
13.3%
e 4437
 
1.4%
l 1669
 
0.5%
r 1425
 
0.5%
h 1414
 
0.5%
C 1227
 
0.4%
Other values (27) 4570
 
1.5%
Common
ValueCountFrequency (%)
420
60.2%
| 148
 
21.2%
; 72
 
10.3%
2 25
 
3.6%
- 25
 
3.6%
> 4
 
0.6%
1 2
 
0.3%
, 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 307047
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 125382
40.8%
o 43759
 
14.3%
w 40822
 
13.3%
U 40822
 
13.3%
k 40822
 
13.3%
e 4437
 
1.4%
l 1669
 
0.5%
r 1425
 
0.5%
h 1414
 
0.5%
C 1227
 
0.4%
Other values (35) 5268
 
1.7%

HTL_deposition_solvents_mixing_ratios
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct14
Distinct (%)2.8%
Missing41988
Missing (%)98.8%
Memory size332.1 KiB
1
411 
1 | nan
 
26
1 | 1
 
20
5; 1
 
15
1; 8
 
10
Other values (9)
 
27

Length

Max length14
Median length1
Mean length1.9056974
Min length1

Characters and Unicode

Total characters970
Distinct characters14
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.6%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 411
 
1.0%
1 | nan 26
 
0.1%
1 | 1 20
 
< 0.1%
5; 1 15
 
< 0.1%
1; 8 10
 
< 0.1%
1; 0.012 5
 
< 0.1%
1; 0.1 5
 
< 0.1%
1; 1 5
 
< 0.1%
0.1; 51 4
 
< 0.1%
nan | 1 3
 
< 0.1%
Other values (4) 5
 
< 0.1%
(Missing) 41988
98.8%

Length

2023-05-05T12:08:07.242609image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 532
80.6%
52
 
7.9%
nan 30
 
4.5%
5 15
 
2.3%
8 10
 
1.5%
0.1 9
 
1.4%
0.012 5
 
0.8%
51 4
 
0.6%
0.006 2
 
0.3%
3 1
 
0.2%

Most occurring characters

ValueCountFrequency (%)
1 550
56.7%
151
 
15.6%
n 60
 
6.2%
| 50
 
5.2%
; 47
 
4.8%
a 30
 
3.1%
0 25
 
2.6%
5 19
 
2.0%
. 16
 
1.6%
8 10
 
1.0%
Other values (4) 12
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 612
63.1%
Space Separator 151
 
15.6%
Lowercase Letter 90
 
9.3%
Other Punctuation 63
 
6.5%
Math Symbol 54
 
5.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 550
89.9%
0 25
 
4.1%
5 19
 
3.1%
8 10
 
1.6%
2 5
 
0.8%
6 2
 
0.3%
3 1
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
n 60
66.7%
a 30
33.3%
Math Symbol
ValueCountFrequency (%)
| 50
92.6%
> 4
 
7.4%
Other Punctuation
ValueCountFrequency (%)
; 47
74.6%
. 16
 
25.4%
Space Separator
ValueCountFrequency (%)
151
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 880
90.7%
Latin 90
 
9.3%

Most frequent character per script

Common
ValueCountFrequency (%)
1 550
62.5%
151
 
17.2%
| 50
 
5.7%
; 47
 
5.3%
0 25
 
2.8%
5 19
 
2.2%
. 16
 
1.8%
8 10
 
1.1%
2 5
 
0.6%
> 4
 
0.5%
Other values (2) 3
 
0.3%
Latin
ValueCountFrequency (%)
n 60
66.7%
a 30
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 550
56.7%
151
 
15.6%
n 60
 
6.2%
| 50
 
5.2%
; 47
 
4.8%
a 30
 
3.1%
0 25
 
2.6%
5 19
 
2.0%
. 16
 
1.6%
8 10
 
1.0%
Other values (4) 12
 
1.2%

Backcontact_stack_sequence
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct289
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Au
20531 
Ag
12985 
Al
3138 
Carbon
2132 
MoO3 | Ag
 
693
Other values (284)
3018 

Length

Max length49
Median length2
Mean length2.7728546
Min length1

Characters and Unicode

Total characters117838
Distinct characters63
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique86 ?
Unique (%)0.2%

Sample

1st rowAu
2nd rowAu
3rd rowAu
4th rowAu
5th rowAu

Common Values

ValueCountFrequency (%)
Au 20531
48.3%
Ag 12985
30.6%
Al 3138
 
7.4%
Carbon 2132
 
5.0%
MoO3 | Ag 693
 
1.6%
Cu 536
 
1.3%
Ca | Al 307
 
0.7%
MoO3 | Al 120
 
0.3%
MoO3 | Au 117
 
0.3%
AgAl 77
 
0.2%
Other values (279) 1861
 
4.4%

Length

2023-05-05T12:08:07.406673image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
au 20914
43.3%
ag 14131
29.2%
al 3686
 
7.6%
2799
 
5.8%
carbon 2383
 
4.9%
moo3 1105
 
2.3%
cu 599
 
1.2%
ca 337
 
0.7%
moox 212
 
0.4%
ito 208
 
0.4%
Other values (170) 1978
 
4.1%

Most occurring characters

ValueCountFrequency (%)
A 39203
33.3%
u 21578
18.3%
g 14402
 
12.2%
5855
 
5.0%
l 3911
 
3.3%
o 3888
 
3.3%
C 3615
 
3.1%
a 3144
 
2.7%
n 2923
 
2.5%
| 2799
 
2.4%
Other values (53) 16520
14.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 57168
48.5%
Uppercase Letter 50022
42.4%
Space Separator 5855
 
5.0%
Math Symbol 2825
 
2.4%
Decimal Number 1344
 
1.1%
Dash Punctuation 340
 
0.3%
Other Punctuation 282
 
0.2%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u 21578
37.7%
g 14402
25.2%
l 3911
 
6.8%
o 3888
 
6.8%
a 3144
 
5.5%
n 2923
 
5.1%
r 2723
 
4.8%
b 2558
 
4.5%
t 323
 
0.6%
e 321
 
0.6%
Other values (14) 1397
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
A 39203
78.4%
C 3615
 
7.2%
O 2086
 
4.2%
M 1556
 
3.1%
T 685
 
1.4%
P 523
 
1.0%
S 521
 
1.0%
I 317
 
0.6%
G 296
 
0.6%
F 246
 
0.5%
Other values (12) 974
 
1.9%
Decimal Number
ValueCountFrequency (%)
3 1183
88.0%
2 109
 
8.1%
0 18
 
1.3%
6 13
 
1.0%
4 9
 
0.7%
5 6
 
0.4%
1 6
 
0.4%
Other Punctuation
ValueCountFrequency (%)
: 131
46.5%
; 120
42.6%
@ 27
 
9.6%
' 4
 
1.4%
Math Symbol
ValueCountFrequency (%)
| 2799
99.1%
26
 
0.9%
Space Separator
ValueCountFrequency (%)
5855
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 340
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 107190
91.0%
Common 10648
 
9.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 39203
36.6%
u 21578
20.1%
g 14402
 
13.4%
l 3911
 
3.6%
o 3888
 
3.6%
C 3615
 
3.4%
a 3144
 
2.9%
n 2923
 
2.7%
r 2723
 
2.5%
b 2558
 
2.4%
Other values (36) 9245
 
8.6%
Common
ValueCountFrequency (%)
5855
55.0%
| 2799
26.3%
3 1183
 
11.1%
- 340
 
3.2%
: 131
 
1.2%
; 120
 
1.1%
2 109
 
1.0%
@ 27
 
0.3%
26
 
0.2%
0 18
 
0.2%
Other values (7) 40
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 117812
> 99.9%
Math Operators 26
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 39203
33.3%
u 21578
18.3%
g 14402
 
12.2%
5855
 
5.0%
l 3911
 
3.3%
o 3888
 
3.3%
C 3615
 
3.1%
a 3144
 
2.7%
n 2923
 
2.5%
| 2799
 
2.4%
Other values (52) 16494
14.0%
Math Operators
ValueCountFrequency (%)
26
100.0%

Backcontact_thickness_list
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct384
Distinct (%)1.1%
Missing7976
Missing (%)18.8%
Memory size332.1 KiB
100.0
12528 
80.0
8638 
60.0
2500 
70.0
1817 
120.0
1488 
Other values (379)
7550 

Length

Max length32
Median length31
Mean length4.9911938
Min length3

Characters and Unicode

Total characters172301
Distinct characters16
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique126 ?
Unique (%)0.4%

Sample

1st row90.0
2nd row90.0
3rd row90.0
4th row90.0
5th row90.0

Common Values

ValueCountFrequency (%)
100.0 12528
29.5%
80.0 8638
20.3%
60.0 2500
 
5.9%
70.0 1817
 
4.3%
120.0 1488
 
3.5%
150.0 1391
 
3.3%
50.0 875
 
2.1%
90.0 677
 
1.6%
10000.0 385
 
0.9%
200.0 338
 
0.8%
Other values (374) 3884
 
9.1%
(Missing) 7976
18.8%

Length

2023-05-05T12:08:07.573115image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
100.0 13421
34.6%
80.0 8892
22.9%
60.0 2538
 
6.5%
2134
 
5.5%
70.0 1887
 
4.9%
120.0 1620
 
4.2%
150.0 1461
 
3.8%
50.0 948
 
2.4%
90.0 697
 
1.8%
10.0 592
 
1.5%
Other values (174) 4599
 
11.9%

Most occurring characters

ValueCountFrequency (%)
0 87223
50.6%
. 36411
21.1%
1 18828
 
10.9%
8 9311
 
5.4%
4268
 
2.5%
5 3779
 
2.2%
6 2876
 
1.7%
2 2728
 
1.6%
7 2338
 
1.4%
| 2134
 
1.2%
Other values (6) 2405
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 128772
74.7%
Other Punctuation 36411
 
21.1%
Space Separator 4268
 
2.5%
Math Symbol 2134
 
1.2%
Lowercase Letter 708
 
0.4%
Dash Punctuation 8
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 87223
67.7%
1 18828
 
14.6%
8 9311
 
7.2%
5 3779
 
2.9%
6 2876
 
2.2%
2 2728
 
2.1%
7 2338
 
1.8%
9 806
 
0.6%
3 464
 
0.4%
4 419
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
n 472
66.7%
a 236
33.3%
Other Punctuation
ValueCountFrequency (%)
. 36411
100.0%
Space Separator
ValueCountFrequency (%)
4268
100.0%
Math Symbol
ValueCountFrequency (%)
| 2134
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 171593
99.6%
Latin 708
 
0.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 87223
50.8%
. 36411
21.2%
1 18828
 
11.0%
8 9311
 
5.4%
4268
 
2.5%
5 3779
 
2.2%
6 2876
 
1.7%
2 2728
 
1.6%
7 2338
 
1.4%
| 2134
 
1.2%
Other values (4) 1697
 
1.0%
Latin
ValueCountFrequency (%)
n 472
66.7%
a 236
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 172301
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 87223
50.6%
. 36411
21.1%
1 18828
 
10.9%
8 9311
 
5.4%
4268
 
2.5%
5 3779
 
2.2%
6 2876
 
1.7%
2 2728
 
1.6%
7 2338
 
1.4%
| 2134
 
1.2%
Other values (6) 2405
 
1.4%

Backcontact_deposition_procedure
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct113
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Evaporation
36845 
Evaporation | Evaporation
 
1633
Doctor blading
 
1187
Screen printing
 
881
Sputtering
 
269
Other values (108)
 
1682

Length

Max length95
Median length11
Mean length12.009036
Min length3

Characters and Unicode

Total characters510348
Distinct characters42
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.1%

Sample

1st rowEvaporation
2nd rowEvaporation
3rd rowEvaporation
4th rowEvaporation
5th rowEvaporation

Common Values

ValueCountFrequency (%)
Evaporation 36845
86.7%
Evaporation | Evaporation 1633
 
3.8%
Doctor blading 1187
 
2.8%
Screen printing 881
 
2.1%
Sputtering 269
 
0.6%
Unknown 269
 
0.6%
Lamination 230
 
0.5%
Sandwiching 138
 
0.3%
Magnetron sputtering 106
 
0.2%
Evaporation | Evaporation | Evaporation 81
 
0.2%
Other values (103) 858
 
2.0%

Length

2023-05-05T12:08:07.752428image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
evaporation 40822
81.5%
2426
 
4.8%
doctor 1239
 
2.5%
blading 1239
 
2.5%
printing 939
 
1.9%
screen 911
 
1.8%
sputtering 610
 
1.2%
lamination 307
 
0.6%
unknown 297
 
0.6%
sandwiching 201
 
0.4%
Other values (43) 1085
 
2.2%

Most occurring characters

ValueCountFrequency (%)
o 85202
16.7%
a 84571
16.6%
n 48512
9.5%
i 46301
9.1%
t 45145
8.8%
r 45020
8.8%
p 42756
8.4%
v 40836
8.0%
E 40822
8.0%
7581
 
1.5%
Other values (32) 23602
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 454809
89.1%
Uppercase Letter 45115
 
8.8%
Space Separator 7581
 
1.5%
Math Symbol 2483
 
0.5%
Dash Punctuation 360
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 85202
18.7%
a 84571
18.6%
n 48512
10.7%
i 46301
10.2%
t 45145
9.9%
r 45020
9.9%
p 42756
9.4%
v 40836
9.0%
g 3598
 
0.8%
e 3069
 
0.7%
Other values (13) 9799
 
2.2%
Uppercase Letter
ValueCountFrequency (%)
E 40822
90.5%
S 1801
 
4.0%
D 1384
 
3.1%
L 311
 
0.7%
U 299
 
0.7%
M 168
 
0.4%
C 102
 
0.2%
B 84
 
0.2%
P 51
 
0.1%
I 29
 
0.1%
Other values (5) 64
 
0.1%
Math Symbol
ValueCountFrequency (%)
| 2369
95.4%
> 114
 
4.6%
Space Separator
ValueCountFrequency (%)
7581
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 360
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 499924
98.0%
Common 10424
 
2.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 85202
17.0%
a 84571
16.9%
n 48512
9.7%
i 46301
9.3%
t 45145
9.0%
r 45020
9.0%
p 42756
8.6%
v 40836
8.2%
E 40822
8.2%
g 3598
 
0.7%
Other values (28) 17161
 
3.4%
Common
ValueCountFrequency (%)
7581
72.7%
| 2369
 
22.7%
- 360
 
3.5%
> 114
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 510348
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 85202
16.7%
a 84571
16.6%
n 48512
9.5%
i 46301
9.1%
t 45145
8.8%
r 45020
8.8%
p 42756
8.4%
v 40836
8.0%
E 40822
8.0%
7581
 
1.5%
Other values (32) 23602
 
4.6%

Add_lay_front_stack_sequence
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
42363 
MgF2
 
72
Ag-np
 
10
Eu(TTA)2(Phen)MAA
 
10
NaYF4:Eu-np
 
8
Other values (16)
 
34

Length

Max length28
Median length7
Mean length6.9998588
Min length3

Characters and Unicode

Total characters297473
Distinct characters52
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 42363
99.7%
MgF2 72
 
0.2%
Ag-np 10
 
< 0.1%
Eu(TTA)2(Phen)MAA 10
 
< 0.1%
NaYF4:Eu-np 8
 
< 0.1%
CdSeS-QDs 4
 
< 0.1%
Mn:CsPbCl3-QDs 4
 
< 0.1%
N-Graphene-QDs 3
 
< 0.1%
Mica 3
 
< 0.1%
NaF 3
 
< 0.1%
Other values (11) 17
 
< 0.1%

Length

2023-05-05T12:08:07.898726image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 42363
99.7%
mgf2 72
 
0.2%
ag-np 10
 
< 0.1%
eu(tta)2(phen)maa 10
 
< 0.1%
nayf4:eu-np 8
 
< 0.1%
cdses-qds 4
 
< 0.1%
mn:cspbcl3-qds 4
 
< 0.1%
y2o3:eu3 4
 
< 0.1%
n-graphene-qds 3
 
< 0.1%
mica 3
 
< 0.1%
Other values (16) 27
 
0.1%

Most occurring characters

ValueCountFrequency (%)
n 127131
42.7%
o 42370
 
14.2%
U 42363
 
14.2%
k 42363
 
14.2%
w 42363
 
14.2%
M 93
 
< 0.1%
2 86
 
< 0.1%
F 85
 
< 0.1%
g 83
 
< 0.1%
A 44
 
< 0.1%
Other values (42) 492
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 254511
85.6%
Uppercase Letter 42746
 
14.4%
Decimal Number 106
 
< 0.1%
Dash Punctuation 40
 
< 0.1%
Close Punctuation 20
 
< 0.1%
Open Punctuation 20
 
< 0.1%
Other Punctuation 16
 
< 0.1%
Space Separator 11
 
< 0.1%
Math Symbol 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 127131
50.0%
o 42370
 
16.6%
k 42363
 
16.6%
w 42363
 
16.6%
g 83
 
< 0.1%
e 32
 
< 0.1%
u 27
 
< 0.1%
p 26
 
< 0.1%
s 20
 
< 0.1%
a 19
 
< 0.1%
Other values (13) 77
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
U 42363
99.1%
M 93
 
0.2%
F 85
 
0.2%
A 44
 
0.1%
E 23
 
0.1%
P 20
 
< 0.1%
T 20
 
< 0.1%
D 17
 
< 0.1%
N 15
 
< 0.1%
Q 13
 
< 0.1%
Other values (10) 53
 
0.1%
Decimal Number
ValueCountFrequency (%)
2 86
81.1%
3 12
 
11.3%
4 8
 
7.5%
Dash Punctuation
ValueCountFrequency (%)
- 40
100.0%
Close Punctuation
ValueCountFrequency (%)
) 20
100.0%
Open Punctuation
ValueCountFrequency (%)
( 20
100.0%
Other Punctuation
ValueCountFrequency (%)
: 16
100.0%
Space Separator
ValueCountFrequency (%)
11
100.0%
Math Symbol
ValueCountFrequency (%)
| 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 297257
99.9%
Common 216
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 127131
42.8%
o 42370
 
14.3%
U 42363
 
14.3%
k 42363
 
14.3%
w 42363
 
14.3%
M 93
 
< 0.1%
F 85
 
< 0.1%
g 83
 
< 0.1%
A 44
 
< 0.1%
e 32
 
< 0.1%
Other values (33) 330
 
0.1%
Common
ValueCountFrequency (%)
2 86
39.8%
- 40
18.5%
) 20
 
9.3%
( 20
 
9.3%
: 16
 
7.4%
3 12
 
5.6%
11
 
5.1%
4 8
 
3.7%
| 3
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 297473
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 127131
42.7%
o 42370
 
14.2%
U 42363
 
14.2%
k 42363
 
14.2%
w 42363
 
14.2%
M 93
 
< 0.1%
2 86
 
< 0.1%
F 85
 
< 0.1%
g 83
 
< 0.1%
A 44
 
< 0.1%
Other values (42) 492
 
0.2%

Add_lay_back
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
42469 
True
 
28
ValueCountFrequency (%)
False 42469
99.9%
True 28
 
0.1%
2023-05-05T12:08:08.029781image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Encapsulation
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.6 KiB
False
40754 
True
 
1743
ValueCountFrequency (%)
False 40754
95.9%
True 1743
 
4.1%
2023-05-05T12:08:08.157256image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Encapsulation_stack_sequence
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct118
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size332.1 KiB
Unknown
41286 
SLG
 
277
Cover glass-QDs
 
121
Epoxy
 
94
UV-curable epoxy
 
83
Other values (113)
 
636

Length

Max length67
Median length7
Mean length7.135186
Min length3

Characters and Unicode

Total characters303224
Distinct characters64
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.1%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 41286
97.2%
SLG 277
 
0.7%
Cover glass-QDs 121
 
0.3%
Epoxy 94
 
0.2%
UV-curable epoxy 83
 
0.2%
PMMA 58
 
0.1%
UV-curable epoxy | Cover glass-QDs 29
 
0.1%
Polyisobutene 26
 
0.1%
Polymer 25
 
0.1%
UV-glue | SLG 25
 
0.1%
Other values (108) 473
 
1.1%

Length

2023-05-05T12:08:08.306580image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown 41286
94.5%
slg 416
 
1.0%
epoxy 297
 
0.7%
233
 
0.5%
cover 174
 
0.4%
glass-qds 172
 
0.4%
uv-curable 118
 
0.3%
pmma 59
 
0.1%
uv-glue 54
 
0.1%
surlyn 49
 
0.1%
Other values (135) 833
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n 124143
40.9%
o 42008
 
13.9%
U 41533
 
13.7%
w 41313
 
13.6%
k 41287
 
13.6%
1194
 
0.4%
e 972
 
0.3%
s 775
 
0.3%
l 763
 
0.3%
r 647
 
0.2%
Other values (54) 8589
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 256039
84.4%
Uppercase Letter 44846
 
14.8%
Space Separator 1194
 
0.4%
Dash Punctuation 407
 
0.1%
Decimal Number 263
 
0.1%
Math Symbol 233
 
0.1%
Close Punctuation 85
 
< 0.1%
Open Punctuation 85
 
< 0.1%
Other Punctuation 72
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 124143
48.5%
o 42008
 
16.4%
w 41313
 
16.1%
k 41287
 
16.1%
e 972
 
0.4%
s 775
 
0.3%
l 763
 
0.3%
r 647
 
0.3%
a 615
 
0.2%
y 505
 
0.2%
Other values (13) 3011
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
U 41533
92.6%
S 507
 
1.1%
L 428
 
1.0%
G 428
 
1.0%
V 286
 
0.6%
P 237
 
0.5%
C 219
 
0.5%
D 198
 
0.4%
E 192
 
0.4%
Q 172
 
0.4%
Other values (12) 646
 
1.4%
Decimal Number
ValueCountFrequency (%)
3 91
34.6%
2 69
26.2%
0 29
 
11.0%
5 27
 
10.3%
1 22
 
8.4%
4 14
 
5.3%
9 9
 
3.4%
6 1
 
0.4%
8 1
 
0.4%
Other Punctuation
ValueCountFrequency (%)
, 35
48.6%
; 26
36.1%
: 6
 
8.3%
' 3
 
4.2%
. 2
 
2.8%
Space Separator
ValueCountFrequency (%)
1194
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 407
100.0%
Math Symbol
ValueCountFrequency (%)
| 233
100.0%
Close Punctuation
ValueCountFrequency (%)
) 85
100.0%
Open Punctuation
ValueCountFrequency (%)
( 85
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 300885
99.2%
Common 2339
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 124143
41.3%
o 42008
 
14.0%
U 41533
 
13.8%
w 41313
 
13.7%
k 41287
 
13.7%
e 972
 
0.3%
s 775
 
0.3%
l 763
 
0.3%
r 647
 
0.2%
a 615
 
0.2%
Other values (35) 6829
 
2.3%
Common
ValueCountFrequency (%)
1194
51.0%
- 407
 
17.4%
| 233
 
10.0%
3 91
 
3.9%
) 85
 
3.6%
( 85
 
3.6%
2 69
 
2.9%
, 35
 
1.5%
0 29
 
1.2%
5 27
 
1.2%
Other values (9) 84
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 303224
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 124143
40.9%
o 42008
 
13.9%
U 41533
 
13.7%
w 41313
 
13.6%
k 41287
 
13.6%
1194
 
0.4%
e 972
 
0.3%
s 775
 
0.3%
l 763
 
0.3%
r 647
 
0.2%
Other values (54) 8589
 
2.8%

JV_light_intensity
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct110
Distinct (%)0.3%
Missing65
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean99.948904
Minimum0
Maximum1800
Zeros8
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:08:08.504282image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile100
Q1100
median100
Q3100
95-th percentile100
Maximum1800
Range1800
Interquartile range (IQR)0

Descriptive statistics

Standard deviation19.274965
Coefficient of variation (CV)0.19284819
Kurtosis5808.1874
Mean99.948904
Median Absolute Deviation (MAD)0
Skewness71.515835
Sum4241031.9
Variance371.52428
MonotonicityNot monotonic
2023-05-05T12:08:08.682508image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 42021
98.9%
200 25
 
0.1%
50 22
 
0.1%
10 19
 
< 0.1%
0.146 16
 
< 0.1%
98 16
 
< 0.1%
95 15
 
< 0.1%
80 12
 
< 0.1%
95.6 12
 
< 0.1%
81 11
 
< 0.1%
Other values (100) 263
 
0.6%
(Missing) 65
 
0.2%
ValueCountFrequency (%)
0 8
< 0.1%
0.000146 2
 
< 0.1%
0.0058 5
 
< 0.1%
0.03 7
< 0.1%
0.062 1
 
< 0.1%
0.1 1
 
< 0.1%
0.14 1
 
< 0.1%
0.146 16
< 0.1%
0.1516 3
 
< 0.1%
0.2754 7
< 0.1%
ValueCountFrequency (%)
1800 2
 
< 0.1%
1600 3
 
< 0.1%
1300 1
 
< 0.1%
500 1
 
< 0.1%
480 1
 
< 0.1%
300 1
 
< 0.1%
200 25
0.1%
150 3
 
< 0.1%
137 4
 
< 0.1%
120 1
 
< 0.1%

JV_light_spectra
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct4
Distinct (%)< 0.1%
Missing2491
Missing (%)5.9%
Memory size332.1 KiB
AM 1.5
39906 
Am 1.5
 
56
Indoor light
 
43
Monochromatic
 
1

Length

Max length13
Median length6
Mean length6.006624
Min length6

Characters and Unicode

Total characters240301
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowAM 1.5
2nd rowAM 1.5
3rd rowAM 1.5
4th rowAM 1.5
5th rowAM 1.5

Common Values

ValueCountFrequency (%)
AM 1.5 39906
93.9%
Am 1.5 56
 
0.1%
Indoor light 43
 
0.1%
Monochromatic 1
 
< 0.1%
(Missing) 2491
 
5.9%

Length

2023-05-05T12:08:08.835035image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-05T12:08:08.995292image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
am 39962
49.9%
1.5 39962
49.9%
indoor 43
 
0.1%
light 43
 
0.1%
monochromatic 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
40005
16.6%
A 39962
16.6%
1 39962
16.6%
. 39962
16.6%
5 39962
16.6%
M 39907
16.6%
o 89
 
< 0.1%
m 57
 
< 0.1%
t 44
 
< 0.1%
n 44
 
< 0.1%
Other values (9) 307
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 79924
33.3%
Uppercase Letter 79912
33.3%
Space Separator 40005
16.6%
Other Punctuation 39962
16.6%
Lowercase Letter 498
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 89
17.9%
m 57
11.4%
t 44
8.8%
n 44
8.8%
r 44
8.8%
h 44
8.8%
i 44
8.8%
g 43
8.6%
d 43
8.6%
l 43
8.6%
Other values (2) 3
 
0.6%
Uppercase Letter
ValueCountFrequency (%)
A 39962
50.0%
M 39907
49.9%
I 43
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 39962
50.0%
5 39962
50.0%
Space Separator
ValueCountFrequency (%)
40005
100.0%
Other Punctuation
ValueCountFrequency (%)
. 39962
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 159891
66.5%
Latin 80410
33.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 39962
49.7%
M 39907
49.6%
o 89
 
0.1%
m 57
 
0.1%
t 44
 
0.1%
n 44
 
0.1%
r 44
 
0.1%
h 44
 
0.1%
i 44
 
0.1%
g 43
 
0.1%
Other values (5) 132
 
0.2%
Common
ValueCountFrequency (%)
40005
25.0%
1 39962
25.0%
. 39962
25.0%
5 39962
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 240301
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
40005
16.6%
A 39962
16.6%
1 39962
16.6%
. 39962
16.6%
5 39962
16.6%
M 39907
16.6%
o 89
 
< 0.1%
m 57
 
< 0.1%
t 44
 
< 0.1%
n 44
 
< 0.1%
Other values (9) 307
 
0.1%

JV_scan_speed
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct175
Distinct (%)1.1%
Missing26362
Missing (%)62.0%
Infinite0
Infinite (%)0.0%
Mean670.62527
Minimum0.01
Maximum5000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:08:09.151981image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile10
Q133
median100
Q3150
95-th percentile607.5
Maximum5000000
Range5000000
Interquartile range (IQR)117

Descriptive statistics

Standard deviation39884.645
Coefficient of variation (CV)59.473818
Kurtosis15305.475
Mean670.62527
Median Absolute Deviation (MAD)60
Skewness122.33927
Sum10820539
Variance1.5907849 × 109
MonotonicityNot monotonic
2023-05-05T12:08:09.328666image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 4288
 
10.1%
50 2190
 
5.2%
10 1990
 
4.7%
200 983
 
2.3%
20 963
 
2.3%
150 509
 
1.2%
30 487
 
1.1%
300 471
 
1.1%
1000 407
 
1.0%
500 383
 
0.9%
Other values (165) 3464
 
8.2%
(Missing) 26362
62.0%
ValueCountFrequency (%)
0.01 2
 
< 0.1%
0.02 5
 
< 0.1%
0.1 29
0.1%
0.2 3
 
< 0.1%
0.25 1
 
< 0.1%
0.28 2
 
< 0.1%
0.3 4
 
< 0.1%
0.4 8
 
< 0.1%
0.6 4
 
< 0.1%
1 17
< 0.1%
ValueCountFrequency (%)
5000000 1
 
< 0.1%
500000 2
 
< 0.1%
165000 3
 
< 0.1%
100000 7
 
< 0.1%
50000 1
 
< 0.1%
30000 5
 
< 0.1%
20000 22
0.1%
18800 1
 
< 0.1%
10000 26
0.1%
7300 2
 
< 0.1%

JV_scan_delay_time
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)7.9%
Missing42370
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean192.71496
Minimum0.3
Maximum1000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:08:09.464608image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0.3
5-th percentile1
Q150
median100
Q3200
95-th percentile500
Maximum1000
Range999.7
Interquartile range (IQR)150

Descriptive statistics

Standard deviation206.77884
Coefficient of variation (CV)1.0729776
Kurtosis3.7770974
Mean192.71496
Median Absolute Deviation (MAD)99
Skewness1.8306026
Sum24474.8
Variance42757.487
MonotonicityNot monotonic
2023-05-05T12:08:09.588139image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
200 30
 
0.1%
100 28
 
0.1%
500 22
 
0.1%
50 21
 
< 0.1%
10 9
 
< 0.1%
0.3 6
 
< 0.1%
1 3
 
< 0.1%
1000 3
 
< 0.1%
150 3
 
< 0.1%
40 2
 
< 0.1%
(Missing) 42370
99.7%
ValueCountFrequency (%)
0.3 6
 
< 0.1%
1 3
 
< 0.1%
10 9
 
< 0.1%
40 2
 
< 0.1%
50 21
< 0.1%
100 28
0.1%
150 3
 
< 0.1%
200 30
0.1%
500 22
0.1%
1000 3
 
< 0.1%
ValueCountFrequency (%)
1000 3
 
< 0.1%
500 22
0.1%
200 30
0.1%
150 3
 
< 0.1%
100 28
0.1%
50 21
< 0.1%
40 2
 
< 0.1%
10 9
 
< 0.1%
1 3
 
< 0.1%
0.3 6
 
< 0.1%

JV_scan_integration_time
Categorical

HIGH CORRELATION  MISSING 

Distinct4
Distinct (%)20.0%
Missing42477
Missing (%)> 99.9%
Memory size332.1 KiB
30.0
20.0
16.67
3.0

Length

Max length5
Median length4
Mean length4.05
Min length3

Characters and Unicode

Total characters81
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row16.67
2nd row16.67
3rd row16.67
4th row16.67
5th row30.0

Common Values

ValueCountFrequency (%)
30.0 7
 
< 0.1%
20.0 6
 
< 0.1%
16.67 4
 
< 0.1%
3.0 3
 
< 0.1%
(Missing) 42477
> 99.9%

Length

2023-05-05T12:08:09.728115image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-05T12:08:09.894513image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
30.0 7
35.0%
20.0 6
30.0%
16.67 4
20.0%
3.0 3
15.0%

Most occurring characters

ValueCountFrequency (%)
0 29
35.8%
. 20
24.7%
3 10
 
12.3%
6 8
 
9.9%
2 6
 
7.4%
1 4
 
4.9%
7 4
 
4.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 61
75.3%
Other Punctuation 20
 
24.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 29
47.5%
3 10
 
16.4%
6 8
 
13.1%
2 6
 
9.8%
1 4
 
6.6%
7 4
 
6.6%
Other Punctuation
ValueCountFrequency (%)
. 20
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 81
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 29
35.8%
. 20
24.7%
3 10
 
12.3%
6 8
 
9.9%
2 6
 
7.4%
1 4
 
4.9%
7 4
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 81
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 29
35.8%
. 20
24.7%
3 10
 
12.3%
6 8
 
9.9%
2 6
 
7.4%
1 4
 
4.9%
7 4
 
4.9%

JV_preconditioning_protocol
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct14
Distinct (%)1.4%
Missing41497
Missing (%)97.6%
Memory size332.1 KiB
Light soaking
479 
Potential biasing
417 
Light soaking; Potential biasing
 
26
Heating
 
19
Cooling
 
18
Other values (9)
 
41

Length

Max length32
Median length31
Mean length15.066
Min length4

Characters and Unicode

Total characters15066
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowLight soaking
2nd rowLight soaking
3rd rowLight soaking
4th rowLight soaking
5th rowLight soaking

Common Values

ValueCountFrequency (%)
Light soaking 479
 
1.1%
Potential biasing 417
 
1.0%
Light soaking; Potential biasing 26
 
0.1%
Heating 19
 
< 0.1%
Cooling 18
 
< 0.1%
Voc stabilization 12
 
< 0.1%
MPPT 9
 
< 0.1%
Heating; Light soaking 5
 
< 0.1%
Light Soaking; Potential biasing 4
 
< 0.1%
Bending 3
 
< 0.1%
Other values (4) 8
 
< 0.1%
(Missing) 41497
97.6%

Length

2023-05-05T12:08:10.047371image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
light 517
25.6%
soaking 517
25.6%
potential 449
22.3%
biasing 447
22.2%
heating 24
 
1.2%
cooling 18
 
0.9%
voc 12
 
0.6%
stabilization 12
 
0.6%
mppt 9
 
0.4%
bending 3
 
0.1%
Other values (4) 9
 
0.4%

Most occurring characters

ValueCountFrequency (%)
i 2462
16.3%
g 1528
10.1%
n 1490
9.9%
t 1467
9.7%
a 1463
9.7%
o 1031
 
6.8%
1017
 
6.8%
s 975
 
6.5%
k 522
 
3.5%
L 517
 
3.4%
Other values (23) 2594
17.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12943
85.9%
Uppercase Letter 1069
 
7.1%
Space Separator 1017
 
6.8%
Other Punctuation 37
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 2462
19.0%
g 1528
11.8%
n 1490
11.5%
t 1467
11.3%
a 1463
11.3%
o 1031
8.0%
s 975
 
7.5%
k 522
 
4.0%
h 517
 
4.0%
e 490
 
3.8%
Other values (10) 998
7.7%
Uppercase Letter
ValueCountFrequency (%)
L 517
48.4%
P 467
43.7%
H 24
 
2.2%
C 18
 
1.7%
V 12
 
1.1%
M 9
 
0.8%
T 9
 
0.8%
S 5
 
0.5%
B 3
 
0.3%
U 3
 
0.3%
Space Separator
ValueCountFrequency (%)
1017
100.0%
Other Punctuation
ValueCountFrequency (%)
; 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14012
93.0%
Common 1054
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 2462
17.6%
g 1528
10.9%
n 1490
10.6%
t 1467
10.5%
a 1463
10.4%
o 1031
7.4%
s 975
 
7.0%
k 522
 
3.7%
L 517
 
3.7%
h 517
 
3.7%
Other values (21) 2040
14.6%
Common
ValueCountFrequency (%)
1017
96.5%
; 37
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15066
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 2462
16.3%
g 1528
10.1%
n 1490
9.9%
t 1467
9.7%
a 1463
9.7%
o 1031
 
6.8%
1017
 
6.8%
s 975
 
6.5%
k 522
 
3.5%
L 517
 
3.4%
Other values (23) 2594
17.2%

JV_preconditioning_time
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct21
Distinct (%)9.6%
Missing42279
Missing (%)99.5%
Infinite0
Infinite (%)0.0%
Mean596.3211
Minimum2
Maximum6000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:08:10.174740image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q110
median60
Q3600
95-th percentile1800
Maximum6000
Range5998
Interquartile range (IQR)590

Descriptive statistics

Standard deviation1199.0264
Coefficient of variation (CV)2.010706
Kurtosis10.447455
Mean596.3211
Median Absolute Deviation (MAD)58
Skewness3.1682549
Sum129998
Variance1437664.3
MonotonicityNot monotonic
2023-05-05T12:08:10.293052image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
10 33
 
0.1%
600 25
 
0.1%
1800 22
 
0.1%
2 22
 
0.1%
60 21
 
< 0.1%
5 14
 
< 0.1%
30 13
 
< 0.1%
900 13
 
< 0.1%
300 10
 
< 0.1%
3 8
 
< 0.1%
Other values (11) 37
 
0.1%
(Missing) 42279
99.5%
ValueCountFrequency (%)
2 22
0.1%
3 8
 
< 0.1%
5 14
< 0.1%
10 33
0.1%
20 3
 
< 0.1%
30 13
 
< 0.1%
40 5
 
< 0.1%
45 8
 
< 0.1%
60 21
< 0.1%
120 4
 
< 0.1%
ValueCountFrequency (%)
6000 5
 
< 0.1%
4800 5
 
< 0.1%
1800 22
0.1%
1200 1
 
< 0.1%
900 13
< 0.1%
600 25
0.1%
500 3
 
< 0.1%
360 1
 
< 0.1%
300 10
 
< 0.1%
240 1
 
< 0.1%

JV_preconditioning_potential
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct19
Distinct (%)14.1%
Missing42362
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean0.85274074
Minimum-1.5
Maximum3
Zeros4
Zeros (%)< 0.1%
Negative31
Negative (%)0.1%
Memory size332.1 KiB
2023-05-05T12:08:10.435113image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum-1.5
5-th percentile-0.9
Q10
median1.2
Q31.4
95-th percentile2
Maximum3
Range4.5
Interquartile range (IQR)1.4

Descriptive statistics

Standard deviation0.91148512
Coefficient of variation (CV)1.0688889
Kurtosis-0.021268119
Mean0.85274074
Median Absolute Deviation (MAD)0.2
Skewness-0.91484955
Sum115.12
Variance0.83080512
MonotonicityNot monotonic
2023-05-05T12:08:10.548463image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1.2 41
 
0.1%
1.4 28
 
0.1%
2 11
 
< 0.1%
-0.5 7
 
< 0.1%
-0.1 6
 
< 0.1%
0.97 6
 
< 0.1%
1.5 6
 
< 0.1%
0 4
 
< 0.1%
-0.3 4
 
< 0.1%
-0.7 4
 
< 0.1%
Other values (9) 18
 
< 0.1%
(Missing) 42362
99.7%
ValueCountFrequency (%)
-1.5 2
 
< 0.1%
-1.2 3
< 0.1%
-1 1
 
< 0.1%
-0.9 4
< 0.1%
-0.7 4
< 0.1%
-0.5 7
< 0.1%
-0.3 4
< 0.1%
-0.1 6
< 0.1%
0 4
< 0.1%
0.5 2
 
< 0.1%
ValueCountFrequency (%)
3 1
 
< 0.1%
2 11
 
< 0.1%
1.5 6
 
< 0.1%
1.4 28
0.1%
1.3 2
 
< 0.1%
1.2 41
0.1%
1 2
 
< 0.1%
0.97 6
 
< 0.1%
0.6 1
 
< 0.1%
0.5 2
 
< 0.1%

JV_default_PCE
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct2318
Distinct (%)5.6%
Missing925
Missing (%)2.2%
Infinite0
Infinite (%)0.0%
Mean12.021482
Minimum0
Maximum36.2
Zeros74
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size332.1 KiB
2023-05-05T12:08:10.692622image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q18.45
median12.72
Q316.1
95-th percentile19.35
Maximum36.2
Range36.2
Interquartile range (IQR)7.65

Descriptive statistics

Standard deviation5.2368397
Coefficient of variation (CV)0.43562346
Kurtosis-0.50177351
Mean12.021482
Median Absolute Deviation (MAD)3.72
Skewness-0.42513749
Sum499757.06
Variance27.42449
MonotonicityNot monotonic
2023-05-05T12:08:10.851346image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14 186
 
0.4%
13 181
 
0.4%
15 177
 
0.4%
15.2 171
 
0.4%
12 167
 
0.4%
14.2 163
 
0.4%
14.5 153
 
0.4%
13.3 152
 
0.4%
16 151
 
0.4%
17 149
 
0.4%
Other values (2308) 39922
93.9%
(Missing) 925
 
2.2%
ValueCountFrequency (%)
0 74
0.2%
0.000225144 1
 
< 0.1%
0.00042 1
 
< 0.1%
0.0007 1
 
< 0.1%
0.00083 1
 
< 0.1%
0.001 1
 
< 0.1%
0.0013804 1
 
< 0.1%
0.002 1
 
< 0.1%
0.003 1
 
< 0.1%
0.004 1
 
< 0.1%
ValueCountFrequency (%)
36.2 1
 
< 0.1%
35.9 1
 
< 0.1%
35.2 3
< 0.1%
34.8 1
 
< 0.1%
33.4 1
 
< 0.1%
33 1
 
< 0.1%
32.3 1
 
< 0.1%
31.4 1
 
< 0.1%
30.6 1
 
< 0.1%
30.3 1
 
< 0.1%

Interactions

2023-05-05T12:07:51.700496image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:41.808962image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.037996image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:44.167769image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:45.498885image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:46.828762image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:48.199431image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.406468image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.511228image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:51.869956image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:41.944829image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.170068image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:44.320884image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:45.641855image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:46.956135image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:48.323257image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.529516image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.660539image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:51.996150image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:42.067515image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.291302image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:44.480449image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:45.788950image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:47.083732image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:48.427228image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.661060image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.783907image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:52.157136image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:42.223400image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.428020image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:44.644758image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:45.972039image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:47.239711image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:48.550431image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.800849image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.945581image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:52.321685image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:42.382117image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.542017image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:44.805000image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:46.135483image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:47.406109image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:48.662640image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.918890image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:51.074974image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:52.475310image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:42.516267image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.663343image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:44.941594image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:46.287631image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:47.586701image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:48.779712image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.041476image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:51.213243image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:52.587941image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:42.639042image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.792406image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:45.059986image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:46.416534image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:47.761322image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.082965image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.142448image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:51.336248image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:52.732176image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:42.770474image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:43.918729image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:45.195716image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:46.552448image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:47.911711image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.187587image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.268973image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:51.464400image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:52.860602image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:42.893204image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:44.030812image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:45.324548image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:46.663486image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:48.053316image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:49.297509image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:50.387789image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
2023-05-05T12:07:51.576247image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

2023-05-05T12:08:11.055440image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Cell_area_measuredModule_area_totalPerovskite_deposition_number_of_deposition_stepsJV_light_intensityJV_scan_speedJV_scan_delay_timeJV_preconditioning_timeJV_preconditioning_potentialJV_default_PCECell_architectureCell_flexibleModuleETL_surface_treatment_before_next_deposition_stepPerovskite_dimension_2DPerovskite_dimension_3DPerovskite_dimension_3D_with_2D_capping_layerPerovskite_composition_perovskite_ABC3_structurePerovskite_composition_b_ionsPerovskite_composition_c_ionsPerovskite_composition_none_stoichiometry_components_in_excessPerovskite_band_gap_gradedPerovskite_deposition_quenching_induced_crystallisationPerovskite_deposition_quenching_mediaPerovskite_deposition_quenching_media_additives_compoundsPerovskite_deposition_thermal_annealing_atmospherePerovskite_deposition_solvent_annealingPerovskite_deposition_solvent_annealing_solvent_atmospherePerovskite_surface_treatment_before_next_deposition_stepHTL_deposition_synthesis_atmosphereHTL_deposition_solventsHTL_deposition_solvents_mixing_ratiosAdd_lay_front_stack_sequenceAdd_lay_backEncapsulationJV_light_spectraJV_scan_integration_timeJV_preconditioning_protocol
Cell_area_measured1.0000.749-0.012-0.053-0.1390.5190.045-0.1700.0170.0000.0000.0001.0000.0000.0000.0000.0000.0000.0001.0000.0000.0000.0001.0000.0000.0000.0001.0000.0000.0001.0000.0000.0000.0000.0001.0001.000
Module_area_total0.7491.0000.134NaN0.033NaN0.893NaN0.3350.0000.2020.1211.0001.0000.2920.2920.0000.0000.0820.0001.0000.2220.1711.0000.3000.0000.0000.0000.0000.0001.0000.1500.0000.3410.0000.0000.987
Perovskite_deposition_number_of_deposition_steps-0.0120.1341.000-0.0170.0220.263-0.0150.476-0.0890.0580.0110.0410.2860.0850.0770.0870.0000.4940.3660.2430.3000.5000.1850.0000.2960.0110.0170.0000.0590.0220.0000.0000.0120.0250.0310.8010.145
JV_light_intensity-0.053NaN-0.0171.0000.004NaNNaNNaN-0.0080.0000.0000.0001.0000.0000.0000.0000.0000.0000.0000.0000.0000.0100.0001.0000.0300.0120.0001.0000.0000.0001.0000.0000.0000.0000.0001.0000.000
JV_scan_speed-0.1390.0330.0220.0041.000-0.309-0.1910.092-0.0590.0000.0000.0001.0000.0000.0000.0000.0000.0000.0001.0000.0000.0000.0441.0000.0000.0000.0001.0000.0000.0001.0000.0000.0000.0000.0001.0001.000
JV_scan_delay_time0.519NaN0.263NaN-0.3091.000NaNNaN-0.2980.1700.3631.0001.0000.5510.5511.0001.0000.1710.2210.5581.0000.2650.2461.0000.3770.1041.0000.0000.3100.4240.9751.0001.0000.1170.0000.0000.000
JV_preconditioning_time0.0450.893-0.015NaN-0.191NaN1.0000.030-0.2580.2991.0000.0840.0001.0001.0001.0001.0000.5050.4731.0001.0000.5690.6571.0000.0001.0001.0000.0000.0000.0001.0000.0001.0000.8931.0000.0000.705
JV_preconditioning_potential-0.170NaN0.476NaN0.092NaN0.0301.0000.1470.4931.0001.0000.0001.0001.0001.0001.0000.1570.1660.4721.0000.3960.0450.0000.4021.0001.0000.0000.0000.0000.0001.0001.0000.0001.0000.0000.287
JV_default_PCE0.0170.335-0.089-0.008-0.059-0.298-0.2580.1471.0000.0410.0450.0400.2980.1210.1270.1290.1720.1440.1340.1910.0060.3480.1290.3350.0340.0470.0181.0000.0230.0310.1940.0380.0090.0390.3950.3220.435
Cell_architecture0.0000.0000.0580.0000.0000.1700.2990.4930.0411.0000.0940.0390.3170.0620.0430.0000.0210.1330.0660.1360.0090.0620.0830.2410.0130.1120.1080.0000.0350.0630.4090.0000.0000.0720.0080.8410.147
Cell_flexible0.0000.2020.0110.0000.0000.3631.0001.0000.0450.0941.0000.0260.6540.0170.0220.0080.0080.0430.0090.1240.0000.0220.1190.3530.0030.0130.0001.0000.0510.0770.2620.0720.0000.0420.0001.0000.363
Module0.0000.1210.0410.0000.0001.0000.0841.0000.0400.0390.0261.0000.3170.0120.0100.0000.0060.0000.0000.0710.0000.0260.0580.0000.0620.0120.0701.0000.0270.0160.1480.0000.0230.0570.0001.0000.183
ETL_surface_treatment_before_next_deposition_step1.0001.0000.2861.0001.0001.0000.0000.0000.2980.3170.6540.3171.0001.0000.1230.0001.0000.0000.2410.4701.0000.5500.4790.9750.5880.0001.0000.0000.6850.3980.9671.0001.0000.5611.0001.0000.000
Perovskite_dimension_2D0.0001.0000.0850.0000.0000.5511.0001.0000.1210.0620.0170.0121.0001.0000.8540.0080.1130.1210.0830.2920.0160.0670.0620.6090.0320.0010.0001.0000.0270.0000.0000.0000.0000.0210.0001.0001.000
Perovskite_dimension_3D0.0000.2920.0770.0000.0000.5511.0001.0000.1270.0430.0220.0100.1230.8541.0000.0250.1230.1190.0810.3550.0000.0670.0890.2010.0770.0100.0001.0000.0600.0390.0000.0000.0000.0090.0021.0000.000
Perovskite_dimension_3D_with_2D_capping_layer0.0000.2920.0870.0000.0001.0001.0001.0000.1290.0000.0080.0000.0000.0080.0251.0000.0060.6190.6760.1450.0000.0350.1260.6710.0080.0070.0001.0000.0140.0060.0000.0000.0000.0000.0001.0001.000
Perovskite_composition_perovskite_ABC3_structure0.0000.0000.0000.0000.0001.0001.0001.0000.1720.0210.0080.0061.0000.1130.1230.0061.0000.5530.1510.2820.0000.0650.1050.5060.0000.0000.0001.0000.0370.0660.0000.0000.0000.0110.0001.0000.000
Perovskite_composition_b_ions0.0000.0000.4940.0000.0000.1710.5050.1570.1440.1330.0430.0000.0000.1210.1190.6190.5531.0000.4390.4260.3950.1080.0190.5360.0580.0000.0001.0000.1460.1080.5590.1430.0000.1230.0000.9430.000
Perovskite_composition_c_ions0.0000.0820.3660.0000.0000.2210.4730.1660.1340.0660.0090.0000.2410.0830.0810.6760.1510.4391.0000.3780.3770.2840.0760.6020.1670.0060.0000.0000.0890.0350.4050.0340.0000.0710.0000.0000.081
Perovskite_composition_none_stoichiometry_components_in_excess1.0000.0000.2430.0001.0000.5581.0000.4720.1910.1360.1240.0710.4700.2920.3550.1450.2820.4260.3781.0000.0420.5790.1270.3920.1490.1760.0001.0000.1460.0940.1590.0570.3440.1370.2910.0000.422
Perovskite_band_gap_graded0.0001.0000.3000.0000.0001.0001.0001.0000.0060.0090.0000.0001.0000.0160.0000.0000.0000.3950.3770.0421.0000.0060.0001.0000.3630.0000.0001.0000.1700.1390.0000.0000.0000.0040.0000.9431.000
Perovskite_deposition_quenching_induced_crystallisation0.0000.2220.5000.0100.0000.2650.5690.3960.3480.0620.0220.0260.5500.0670.0670.0350.0650.1080.2840.5790.0061.0000.9910.2520.0900.0100.0650.0000.0670.0760.2320.0450.0140.0110.0290.9430.230
Perovskite_deposition_quenching_media0.0000.1710.1850.0000.0440.2460.6570.0450.1290.0830.1190.0580.4790.0620.0890.1260.1050.0190.0760.1270.0000.9911.0000.5010.0610.2950.0970.7070.0320.1820.4180.0000.0000.0840.0850.7770.305
Perovskite_deposition_quenching_media_additives_compounds1.0001.0000.0001.0001.0001.0001.0000.0000.3350.2410.3530.0000.9750.6090.2010.6710.5060.5360.6020.3921.0000.2520.5011.0000.3780.0000.0000.0000.2290.1990.3700.3791.0000.5160.9170.0001.000
Perovskite_deposition_thermal_annealing_atmosphere0.0000.3000.2960.0300.0000.3770.0000.4020.0340.0130.0030.0620.5880.0320.0770.0080.0000.0580.1670.1490.3630.0900.0610.3781.0000.1160.0320.0000.2690.1780.1900.0670.0000.0620.1030.9700.035
Perovskite_deposition_solvent_annealing0.0000.0000.0110.0120.0000.1041.0001.0000.0470.1120.0130.0120.0000.0010.0100.0070.0000.0000.0060.1760.0000.0100.2950.0000.1161.0000.8740.0000.0650.1260.4700.0000.0030.0060.0001.0000.000
Perovskite_deposition_solvent_annealing_solvent_atmosphere0.0000.0000.0170.0000.0001.0001.0001.0000.0180.1080.0000.0701.0000.0000.0000.0000.0000.0000.0000.0000.0000.0650.0970.0000.0320.8741.0000.0000.0880.0830.4180.0000.1400.0800.0001.0000.000
Perovskite_surface_treatment_before_next_deposition_step1.0000.0000.0001.0001.0000.0000.0000.0001.0000.0001.0001.0000.0001.0001.0001.0001.0001.0000.0001.0001.0000.0000.7070.0000.0000.0000.0001.0000.0001.0000.0001.0001.0001.0001.0000.000NaN
HTL_deposition_synthesis_atmosphere0.0000.0000.0590.0000.0000.3100.0000.0000.0230.0350.0510.0270.6850.0270.0600.0140.0370.1460.0890.1460.1700.0670.0320.2290.2690.0650.0880.0001.0000.6840.4850.0460.0000.0510.0001.0000.163
HTL_deposition_solvents0.0000.0000.0220.0000.0000.4240.0000.0000.0310.0630.0770.0160.3980.0000.0390.0060.0660.1080.0350.0940.1390.0760.1820.1990.1780.1260.0831.0000.6841.0000.9360.0660.0000.0580.0831.0000.568
HTL_deposition_solvents_mixing_ratios1.0001.0000.0001.0001.0000.9751.0000.0000.1940.4090.2620.1480.9670.0000.0000.0000.0000.5590.4050.1590.0000.2320.4180.3700.1900.4700.4180.0000.4850.9361.0001.0001.0000.2231.0001.0001.000
Add_lay_front_stack_sequence0.0000.1500.0000.0000.0001.0000.0001.0000.0380.0000.0720.0001.0000.0000.0000.0000.0000.1430.0340.0570.0000.0450.0000.3790.0670.0000.0001.0000.0460.0661.0001.0000.5990.0860.0001.0000.535
Add_lay_back0.0000.0000.0120.0000.0001.0001.0001.0000.0090.0000.0000.0231.0000.0000.0000.0000.0000.0000.0000.3440.0000.0140.0001.0000.0000.0030.1401.0000.0000.0001.0000.5991.0000.0380.0001.0001.000
Encapsulation0.0000.3410.0250.0000.0000.1170.8930.0000.0390.0720.0420.0570.5610.0210.0090.0000.0110.1230.0710.1370.0040.0110.0840.5160.0620.0060.0801.0000.0510.0580.2230.0860.0381.0000.0421.0000.211
JV_light_spectra0.0000.0000.0310.0000.0000.0001.0001.0000.3950.0080.0000.0001.0000.0000.0020.0000.0000.0000.0000.2910.0000.0290.0850.9170.1030.0000.0001.0000.0000.0831.0000.0000.0000.0421.0001.0001.000
JV_scan_integration_time1.0000.0000.8011.0001.0000.0000.0000.0000.3220.8411.0001.0001.0001.0001.0001.0001.0000.9430.0000.0000.9430.9430.7770.0000.9701.0001.0000.0001.0001.0001.0001.0001.0001.0001.0001.0000.000
JV_preconditioning_protocol1.0000.9870.1450.0001.0000.0000.7050.2870.4350.1470.3630.1830.0001.0000.0001.0000.0000.0000.0810.4221.0000.2300.3051.0000.0350.0000.000NaN0.1630.5681.0000.5351.0000.2111.0000.0001.000

Missing values

2023-05-05T12:07:53.606161image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-05T12:07:54.694543image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-05-05T12:07:57.126188image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Ref_publication_dateCell_area_measuredCell_architectureCell_flexibleModuleModule_area_totalSubstrate_stack_sequenceETL_stack_sequenceETL_thicknessETL_additives_compoundsETL_deposition_procedureETL_surface_treatment_before_next_deposition_stepPerovskite_dimension_2DPerovskite_dimension_3DPerovskite_dimension_3D_with_2D_capping_layerPerovskite_composition_perovskite_ABC3_structurePerovskite_composition_a_ionsPerovskite_composition_a_ions_coefficientsPerovskite_composition_b_ionsPerovskite_composition_b_ions_coefficientsPerovskite_composition_c_ionsPerovskite_composition_c_ions_coefficientsPerovskite_composition_none_stoichiometry_components_in_excessPerovskite_additives_compoundsPerovskite_additives_concentrationsPerovskite_thicknessPerovskite_band_gapPerovskite_band_gap_gradedPerovskite_pl_maxPerovskite_deposition_number_of_deposition_stepsPerovskite_deposition_procedurePerovskite_deposition_synthesis_atmospherePerovskite_deposition_solventsPerovskite_deposition_solvents_mixing_ratiosPerovskite_deposition_quenching_induced_crystallisationPerovskite_deposition_quenching_mediaPerovskite_deposition_quenching_media_mixing_ratiosPerovskite_deposition_quenching_media_additives_compoundsPerovskite_deposition_thermal_annealing_temperaturePerovskite_deposition_thermal_annealing_timePerovskite_deposition_thermal_annealing_atmospherePerovskite_deposition_solvent_annealingPerovskite_deposition_solvent_annealing_solvent_atmospherePerovskite_deposition_after_treatment_of_formed_perovskitePerovskite_surface_treatment_before_next_deposition_stepHTL_stack_sequenceHTL_thickness_listHTL_additives_compoundsHTL_additives_concentrationsHTL_deposition_procedureHTL_deposition_synthesis_atmosphereHTL_deposition_solventsHTL_deposition_solvents_mixing_ratiosBackcontact_stack_sequenceBackcontact_thickness_listBackcontact_deposition_procedureAdd_lay_front_stack_sequenceAdd_lay_backEncapsulationEncapsulation_stack_sequenceJV_light_intensityJV_light_spectraJV_scan_speedJV_scan_delay_timeJV_scan_integration_timeJV_preconditioning_protocolJV_preconditioning_timeJV_preconditioning_potentialJV_default_PCE
02015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1I3NaNNaNNaNNaN1.27FalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN0.00
12015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1Br; I0.3; 2.7NaNNaNNaNNaNNaNFalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN0.00
22015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1Br; I1.5; 1.5NaNNaNNaNNaNNaNFalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN0.13
32015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1Br; I2.7; 0.3NaNNaNNaNNaNNaNFalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN0.12
42015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1Br3NaNNaNNaNNaN1.75FalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN0.10
52015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1I3NaNSnF20.2NaN1.27FalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN1.66
62015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1Br; I1; 2NaNSnF20.2NaN1.37FalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN1.67
72015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1Br; I2; 1NaNSnF20.2NaN1.65FalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN1.56
82015-01-060.2nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mp65.0 | nanUnknownSpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueCs1Sn1Br3NaNSnF20.2NaN1.75FalseNaN1Spin-coatingN2DMSO1FalseUnknownNaNNaN10010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi(CF3SO2)2N; TBPNaNSpin-coatingUnknownUnknownNaNAu90.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN0.95
92017-06-060.1nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-nwnan | 400.0UnknownCBD | HydrothermalNaNFalseTrueFalseTrueMA1Pb1I3NaNCl0.25NaN1.6False775.01Spin-coatingArDMF1FalseUnknownNaNNaN11060.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi-TFSI; TPBNaNSpin-coatingUnknownUnknownNaNAu80.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN10.30
Ref_publication_dateCell_area_measuredCell_architectureCell_flexibleModuleModule_area_totalSubstrate_stack_sequenceETL_stack_sequenceETL_thicknessETL_additives_compoundsETL_deposition_procedureETL_surface_treatment_before_next_deposition_stepPerovskite_dimension_2DPerovskite_dimension_3DPerovskite_dimension_3D_with_2D_capping_layerPerovskite_composition_perovskite_ABC3_structurePerovskite_composition_a_ionsPerovskite_composition_a_ions_coefficientsPerovskite_composition_b_ionsPerovskite_composition_b_ions_coefficientsPerovskite_composition_c_ionsPerovskite_composition_c_ions_coefficientsPerovskite_composition_none_stoichiometry_components_in_excessPerovskite_additives_compoundsPerovskite_additives_concentrationsPerovskite_thicknessPerovskite_band_gapPerovskite_band_gap_gradedPerovskite_pl_maxPerovskite_deposition_number_of_deposition_stepsPerovskite_deposition_procedurePerovskite_deposition_synthesis_atmospherePerovskite_deposition_solventsPerovskite_deposition_solvents_mixing_ratiosPerovskite_deposition_quenching_induced_crystallisationPerovskite_deposition_quenching_mediaPerovskite_deposition_quenching_media_mixing_ratiosPerovskite_deposition_quenching_media_additives_compoundsPerovskite_deposition_thermal_annealing_temperaturePerovskite_deposition_thermal_annealing_timePerovskite_deposition_thermal_annealing_atmospherePerovskite_deposition_solvent_annealingPerovskite_deposition_solvent_annealing_solvent_atmospherePerovskite_deposition_after_treatment_of_formed_perovskitePerovskite_surface_treatment_before_next_deposition_stepHTL_stack_sequenceHTL_thickness_listHTL_additives_compoundsHTL_additives_concentrationsHTL_deposition_procedureHTL_deposition_synthesis_atmosphereHTL_deposition_solventsHTL_deposition_solvents_mixing_ratiosBackcontact_stack_sequenceBackcontact_thickness_listBackcontact_deposition_procedureAdd_lay_front_stack_sequenceAdd_lay_backEncapsulationEncapsulation_stack_sequenceJV_light_intensityJV_light_spectraJV_scan_speedJV_scan_delay_timeJV_scan_integration_timeJV_preconditioning_protocolJV_preconditioning_timeJV_preconditioning_potentialJV_default_PCE
424872019-07-120.101nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Pb1Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN1.950000
424882019-07-120.113nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.005; 0.995Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN1.250000
424892019-07-120.102nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.005; 0.995Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN2.410000
424902019-07-120.120nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.1; 0.9Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN0.475000
424912019-07-120.074nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.1; 0.9Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN0.891892
424922019-07-120.082nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.1; 0.9Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN1.080000
424932019-07-120.083nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.02; 0.98Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN1.610000
424942019-07-120.092nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.02; 0.98Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN1.610000
424952019-07-120.093nipFalseFalse0.25SLG | FTOTiO2-c | TiO2-mp50.0 | 250.0TiCl4Spin-coating | Spin-coating >> HydrothermalNaNFalseTrueFalseTrueMA1.0Ag; Pb0.05; 0.95Br3.0StoichiometricUndopedNaN400.02.3False535.00Spin-coatingN2DMF; DMSO4; 1TrueChlorobenzeneNaNUndoped10060.0N2FalseUnknownNaNNaNSpiro-MeOTAD200.0Li-TFSI; TBP14.2 mM; 8 vol%Spin-coatingN2ChlorobenzeneNaNAu60.0EvaporationMgF2FalseFalseCover glass-QDs100.0AM 1.5100.0NaNNaNNaNNaNNaN2.000000
424962022-04-110.105UnknownFalseFalseNaNUnknownUnknownNaNNaNUnknownNaNFalseFalseFalseFalseNaNNaNAg; Pb0.005; 0.995NaNNaNNaNNaNNaNNaNNaNFalseNaN0UnknownUnknownUnknownNaNFalseUnknownNaNNaNUnknownUnknownUnknownFalseUnknownNaNNaNUnknownNaNNaNNaNUnknownUnknownUnknownNaNUnknownNaNUnknownUnknownFalseFalseUnknownNaNNaNNaNNaNNaNNaNNaNNaN0.000000

Duplicate rows

Most frequently occurring

Ref_publication_dateCell_area_measuredCell_architectureCell_flexibleModuleModule_area_totalSubstrate_stack_sequenceETL_stack_sequenceETL_thicknessETL_additives_compoundsETL_deposition_procedureETL_surface_treatment_before_next_deposition_stepPerovskite_dimension_2DPerovskite_dimension_3DPerovskite_dimension_3D_with_2D_capping_layerPerovskite_composition_perovskite_ABC3_structurePerovskite_composition_a_ionsPerovskite_composition_a_ions_coefficientsPerovskite_composition_b_ionsPerovskite_composition_b_ions_coefficientsPerovskite_composition_c_ionsPerovskite_composition_c_ions_coefficientsPerovskite_composition_none_stoichiometry_components_in_excessPerovskite_additives_compoundsPerovskite_additives_concentrationsPerovskite_band_gap_gradedPerovskite_deposition_number_of_deposition_stepsPerovskite_deposition_procedurePerovskite_deposition_synthesis_atmospherePerovskite_deposition_solventsPerovskite_deposition_solvents_mixing_ratiosPerovskite_deposition_quenching_induced_crystallisationPerovskite_deposition_quenching_mediaPerovskite_deposition_quenching_media_additives_compoundsPerovskite_deposition_thermal_annealing_temperaturePerovskite_deposition_thermal_annealing_timePerovskite_deposition_thermal_annealing_atmospherePerovskite_deposition_solvent_annealingPerovskite_deposition_solvent_annealing_solvent_atmospherePerovskite_deposition_after_treatment_of_formed_perovskitePerovskite_surface_treatment_before_next_deposition_stepHTL_stack_sequenceHTL_thickness_listHTL_additives_compoundsHTL_additives_concentrationsHTL_deposition_procedureHTL_deposition_synthesis_atmosphereHTL_deposition_solventsHTL_deposition_solvents_mixing_ratiosBackcontact_stack_sequenceBackcontact_thickness_listBackcontact_deposition_procedureAdd_lay_front_stack_sequenceAdd_lay_backEncapsulationEncapsulation_stack_sequenceJV_light_intensityJV_light_spectraJV_scan_speedJV_scan_delay_timeJV_scan_integration_timeJV_preconditioning_protocolJV_preconditioning_timeJV_preconditioning_potentialJV_default_PCE# duplicates
8912019-01-180.09nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mpNaNUnknownSpin-coating | Spin-coatingNaNFalseTrueFalseTrueMA1Pb1I3NaNNaNNaNFalse2Spin-coating >> Spin-coatingAir >> AirDMF >> IPA1 >> 1FalseUnknownNaN100.0 >> 100.010.0 >> 10.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNLi-TFSI; TBPNaNSpin-coatingUnknownUnknownNaNAu80.0SputteringUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNCoolingNaNNaNNaN10
2852017-03-030.09pinFalseFalseNaNSLG | FTOPCBM-60 | BCP100.0 | nanUnknownSpin-coating | Spin-coatingNaNFalseTrueFalseTrueMA1Pb1I3NaNNaNNaNFalse1Spin-coatingN2DMSO; GBL3; 7TrueChlorobenzeneNaN70.030.0UnknownFalseUnknownNaNNaNNiO-c70.0CuNaNSpin-coatingUnknownUnknownNaNAg150.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5100.0NaNNaNNaNNaNNaNNaN9
11982019-10-170.16nipFalseFalseNaNSLG | ITOSnO2-cNaNUnknownUnknownNaNFalseTrueFalseTrueCs; FA; MA0.05; 0.79; 0.16Pb1Br; I0.51; 2.49PbI2NaNNaNFalse1Spin-coatingUnknownDMF; DMSO4; 1TrueChlorobenzeneNaN100.060.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNNaNNaNSpin-coatingUnknownUnknownNaNAuNaNEvaporationUnknownFalseFalseUnknown100.0AM 1.5200.0NaNNaNNaNNaNNaNNaN9
3572017-07-140.10pinFalseFalseNaNSLG | ITOPCBM-60 | BCPNaNUnknownSpin-coating | Spin-coatingNaNFalseTrueFalseTrueMA1Pb1I3MAHPANaNFalse1Spin-coatingUnknownDMF1FalseUnknownNaN1005.0UnknownFalseUnknownNaNNaNPEDOT:PSSNaNNaNNaNSpin-coatingUnknownUnknownNaNAg80.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5500.0NaNNaNPotential biasing2.01.214.508
3582017-07-140.10pinFalseFalseNaNSLG | ITOPCBM-60 | BCPNaNUnknownSpin-coating | Spin-coatingNaNFalseTrueFalseTrueMA1Pb1I3NaNNaNNaNFalse1Spin-coatingUnknownDMSO; GBL3; 7TrueTolueneNaN10015.0UnknownFalseUnknownNaNNaNPEDOT:PSSNaNNaNNaNSpin-coatingUnknownUnknownNaNAg80.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5500.0NaNNaNPotential biasing2.01.210.508
8382018-11-300.04nipFalseFalseNaNSLG | FTOTiO2-c70.0ZnCl2Dipp-coatingNaNFalseTrueFalseTrueMA1Pb1I3NaNNaNNaNFalse1Spin-coatingN2DMSO; GBL3; 7TrueTolueneNaN100.010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNNaNNaNSpin-coatingUnknownUnknownNaNAu80.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaNNaN8
8042018-11-030.16nipFalseFalseNaNSLG | FTOTiO2-c | TiO2-mpNaNUnknown | Li-TFSISpray-pyrolys | Spin-coatingNaNFalseTrueFalseTrueMA1Pb1I3NaNNaNNaNFalse1Spin-coatingN2DMSO1TrueChlorobenzeneNaN100.045.0UnknownFalseUnknownNaNNaNPTAANaNLi-TFSI; TBPNaNSpin-coatingUnknownUnknownNaNAu80.0EvaporationUnknownFalseFalseUnknownNaNNaNNaNNaNNaNNaNNaNNaNNaN7
1062015-11-250.10pinFalseFalseNaNSLG | ITOPCBM-6050.0UnknownSpin-coatingNaNFalseTrueFalseTrueFA1Pb1Br; I3MAINaNNaNFalse2Spin-coating >> Spin-coatingUnknownDMF >> IPA1 >> 1FalseUnknownNaN110.0 >> 110.01.0 >> 5.0UnknownFalseUnknownNaNNaNPEDOT:PSS40.0NaNNaNSpin-coatingUnknownUnknownNaNAl100.0EvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaNNaN6
3932017-08-150.10nipFalseFalseNaNSLG | FTOTiO2-c50.0UnknownSpin-coatingNaNFalseTrueFalseTrueMA1Pb1I3NaNNaNNaNFalse1Spin-coatingAirDMF1TrueHeNaN100.010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNNaNNaNSpin-coatingUnknownUnknownNaNAuNaNEvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN17.086
3942017-08-150.10nipFalseFalseNaNSLG | FTOTiO2-c50.0UnknownSpin-coatingNaNFalseTrueFalseTrueMA1Pb1I3NaNNaNNaNFalse1Spin-coatingAirDMF1TrueN2NaN100.010.0UnknownFalseUnknownNaNNaNSpiro-MeOTADNaNNaNNaNSpin-coatingUnknownUnknownNaNAuNaNEvaporationUnknownFalseFalseUnknown100.0AM 1.5NaNNaNNaNNaNNaNNaN17.986